docext

1,381

101

Python

docext is an on-premises document intelligence toolkit that offers OCR-free unstructured data extraction, markdown conversion, and benchmarking. It leverages vision-language models to convert PDFs and images to markdown, extract document information, and evaluate model performance.

Project Information

Created on 3/25/2025

Updated on 7/2/2025

Topics

rag

document-information-extraction

nlp

llm-ocr

document-data-extraction

onpremise

machine-learning

extraction

onprem-vision

ocr-benchmark

unstructured-data

ocr-onpremise

document

document-analysis

onprem

vlms

llms

table-extraction

ocr

onprem-ocr

docext

Project Information

Categories

Tags

Topics

docext

项目描述

Project Information

Categories

Tags

Topics