Back to home

Open Source Models & Datasets

Vision Language Models and document retrieval datasets. MIT/Apache licensed, deployable on-premise.

7 models 15 datasets MIT / Apache 2.0
ModelApache 2.0
2B

Natotan

Defense-specialized vision-language embedding model. LoRA fine-tuned on Qwen3-VL-Embedding-2B for military document retrieval.

VLMEmbeddingLoRA
ModelMIT
40M

UI-DETR-1

UI element detection (buttons, fields, menus) in screenshots. Fine-tuned on desktop and web interfaces.

DetectionUIComputer Use
ModelApache 2.0
4B

QwenAmann-4B

Visual document retrieval model. Encodes documents and queries for semantic search on page images.

VLMRetrievalDSE
ModelApache 2.0
2B

Flantier2-SmolVLM-2B

Compact VLM for document extraction. Optimized for technical and administrative document processing.

VLM2BExtraction
ModelApache 2.0
2B

Flantier-Nuclear

Specialized VLM for nuclear regulatory documents. Trained on ASN, IAEA and technical documentation corpus.

VLMNucleaireReglementation
ModelApache 2.0
2B

Flantier-SmolVLM-2B

General-purpose 2B parameter VLM for document retrieval. SmolVLM base with fine-tuning on European corpus.

VLM2BGeneraliste
ModelApache 2.0
500M

Flantier-SmolVLM-500M

Ultra-compact VLM for edge deployment. 500M parameters, runs on CPU or modest GPU.

VLM500MEdge
DatasetApache 2.0
1.44M

VDR_MEGA_2

Multi-domain dataset of 1.44M document-query pairs. Covers energy, defense, regulation, technical.

1.44MMulti-domaineRetrieval
DatasetApache 2.0
1.09M

VDR_MultiDomain

Multi-domain document retrieval dataset. 1.09M samples for search model training.

1.09MRetrievalRecherche
DatasetApache 2.0
296K

VDR_Military

Defense sector document dataset. Specifications, technical manuals, operational procedures.

DefenseTechniqueProcedures
DatasetApache 2.0
58.5K

VDR_Nato

NATO & French Military Doctrine dataset. 377 documents, 29,271 pages with bilingual queries for visual document retrieval.

NATODoctrineDefense
DatasetApache 2.0
78.7K

VDR_Nuclear

Nuclear regulatory document dataset. ASN standards, IAEA reports, technical documentation.

NucleaireReglementationASN
DatasetApache 2.0
67.5K

VDR_Hydrogen

Hydrogen sector dataset. Safety standards, technical specifications, European regulations.

HydrogeneEnergieSecurite
DatasetApache 2.0
88.8K

VDR_Renewable

Renewable energy regulation dataset. Solar, wind, biomass. European and French standards.

RenouvelablesReglementationEnergie
DatasetApache 2.0
17.9K

VDR_Energy_Arabic

Arabic energy sector dataset. Technical and regulatory documents from Middle East and North Africa.

ArabeEnergieMENA
DatasetApache 2.0
67.6K

VDR_History_Geography

Historical and geographical document dataset. Maps, archives, territorial studies.

HistoireGeographieArchives
DatasetApache 2.0
6.85K

VDR_Quantum_Papers

Scientific papers dataset on quantum circuits. Diagrams, equations, architectures.

QuantiquePapersRecherche
DatasetApache 2.0
4K

VDR_Quantum_Synthetic

Synthetic quantum circuit dataset. Generated for model training on technical diagrams.

QuantiqueSynthetiqueSchemas
DatasetApache 2.0
285K

VDR_Qualitative

High-quality dataset for evaluation. Manually verified document-query pairs.

QualiteEvaluationBenchmark
DatasetApache 2.0
1.19M

VDR_VisRAG_ColPali

Dataset optimized for VisRAG and ColPali. Format adapted to visual retrieval architectures.

VisRAGColPaliRetrieval
DatasetApache 2.0
730K

VDR_ColPali_VisRAG

ColPali/VisRAG format dataset. 730K pairs for document retrieval model training.

ColPaliVisRAG730K
DatasetApache 2.0
22.8K

VDR_CATIE_XMRec

CATIE dataset for cross-modal recommendation. Documents and queries in French.

CATIEFrancaisRecommandation

Need a custom model?

We can fine-tune our models on your specific documents and domain.