Titan AI LogoTitan AI

zerox

11,902
814
TypeScript

Project Description

OCR & Document Extraction using vision models

zerox: OCR & Document Extraction using vision models

zerox — OCR & Document Extraction with Vision Models

Overview

Zerox is a powerful OCR tool designed to simplify document extraction for AI ingestion. It leverages vision models to handle complex document layouts, including tables, charts, and more. This TypeScript-based project stands out for its ability to convert various file types into images and then extract Markdown text using AI models.

Key Features

  • Supports multiple file formats (PDF, DOCX, images)
  • Converts files into images and extracts text using AI
  • Supports various AI providers (OpenAI, Azure OpenAI, AWS Bedrock, Google Gemini)
  • Data extraction with schema support
  • Maintains document format during extraction
  • Error handling and concurrent processing options

Use Cases

  • AI Data Ingestion: Extracting structured data from documents for AI model training.
  • Document Archiving: Converting physical documents into digital, searchable formats.
  • Content Management: Automating the extraction of content from various documents for web or database use.

Advantages

  • Handles complex document layouts with vision models.
  • Supports a wide range of AI providers for flexibility.
  • Offers both Node.js and Python packages for broader developer accessibility.

Limitations / Considerations

  • Some features are not available in the Python package, such as data extraction with schema and custom system prompts.
  • Requires additional software like graphicsmagick and ghostscript for certain operations.
  • The project's license is currently unknown, which may affect its use in commercial applications.

Similar / Related Projects

  • Tesseract: A more traditional OCR engine that supports a wide range of languages but does not leverage AI models.
  • PaddleOCR: An OCR tool developed by Baidu that uses deep learning models, similar to Zerox but with a different set of supported languages and features.
  • EasyOCR: A recent OCR library that also supports multiple languages and has a simple API, but it may not handle complex layouts as effectively as Zerox.

Basic Information


📊 Project Information

  • Project Name: zerox
  • GitHub URL: https://github.com/getomni-ai/zerox
  • Programming Language: TypeScript
  • ⭐ Stars: 11,837
  • 🍴 Forks: 806
  • 📅 Created: 2024-07-21
  • 🔄 Last Updated: 2025-09-18

🏷️ Project Topics

Topics: [, ", o, c, r, ", ,, , ", p, d, f, ", ]



This article is automatically generated by AI based on GitHub project information and README content analysis

Titan AI Explorehttps://www.titanaiexplore.com/projects/zerox-831872524en-USTechnology

Project Information

Created on 7/21/2024
Updated on 10/31/2025