Titan AI LogoTitan AI

docext

1,381
101
Python

项目描述

docext is an on-premises document intelligence toolkit that offers OCR-free unstructured data extraction, markdown conversion, and benchmarking. It leverages vision-language models to convert PDFs and images to markdown, extract document information, and evaluate model performance.

Project Information

Created on 3/25/2025
Updated on 7/2/2025

Categories

image-processing
machine-learning-framework
text-processing

Tags

ready-to-use
data-processing
algorithm-model
open-source-community
model-deployment

Topics

rag
document-information-extraction
nlp
llm-ocr
document-data-extraction
onpremise
machine-learning
extraction
onprem-vision
ocr-benchmark
unstructured-data
ocr-onpremise
document
document-analysis
onprem
vlms
llms
table-extraction
ocr
onprem-ocr