Open Source Models & Datasets
Vision Language Models and document retrieval datasets. MIT/Apache licensed, deployable on-premise.
Natotan
Defense-specialized vision-language embedding model. LoRA fine-tuned on Qwen3-VL-Embedding-2B for military document retrieval.
UI-DETR-1
UI element detection (buttons, fields, menus) in screenshots. Fine-tuned on desktop and web interfaces.
QwenAmann-4B
Visual document retrieval model. Encodes documents and queries for semantic search on page images.
Flantier2-SmolVLM-2B
Compact VLM for document extraction. Optimized for technical and administrative document processing.
Flantier-Nuclear
Specialized VLM for nuclear regulatory documents. Trained on ASN, IAEA and technical documentation corpus.
Flantier-SmolVLM-2B
General-purpose 2B parameter VLM for document retrieval. SmolVLM base with fine-tuning on European corpus.
Flantier-SmolVLM-500M
Ultra-compact VLM for edge deployment. 500M parameters, runs on CPU or modest GPU.
VDR_MEGA_2
Multi-domain dataset of 1.44M document-query pairs. Covers energy, defense, regulation, technical.
VDR_MultiDomain
Multi-domain document retrieval dataset. 1.09M samples for search model training.
VDR_Military
Defense sector document dataset. Specifications, technical manuals, operational procedures.
VDR_Nato
NATO & French Military Doctrine dataset. 377 documents, 29,271 pages with bilingual queries for visual document retrieval.
VDR_Nuclear
Nuclear regulatory document dataset. ASN standards, IAEA reports, technical documentation.
VDR_Hydrogen
Hydrogen sector dataset. Safety standards, technical specifications, European regulations.
VDR_Renewable
Renewable energy regulation dataset. Solar, wind, biomass. European and French standards.
VDR_Energy_Arabic
Arabic energy sector dataset. Technical and regulatory documents from Middle East and North Africa.
VDR_History_Geography
Historical and geographical document dataset. Maps, archives, territorial studies.
VDR_Quantum_Papers
Scientific papers dataset on quantum circuits. Diagrams, equations, architectures.
VDR_Quantum_Synthetic
Synthetic quantum circuit dataset. Generated for model training on technical diagrams.
VDR_Qualitative
High-quality dataset for evaluation. Manually verified document-query pairs.
VDR_VisRAG_ColPali
Dataset optimized for VisRAG and ColPali. Format adapted to visual retrieval architectures.
VDR_ColPali_VisRAG
ColPali/VisRAG format dataset. 730K pairs for document retrieval model training.
VDR_CATIE_XMRec
CATIE dataset for cross-modal recommendation. Documents and queries in French.
Need a custom model?
We can fine-tune our models on your specific documents and domain.