Titan AI LogoTitan AI

donut

6,679
548
Python

Project Description

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

donut: Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Documen

Project Title

donut — OCR-free Document Understanding Transformer for State-of-the-Art Visual Document Tasks

Overview

Donut is an OCR-free end-to-end Transformer model designed for document understanding tasks. It eliminates the need for off-the-shelf OCR engines/APIs while achieving state-of-the-art performance in visual document classification and information extraction. The project also includes SynthDoG, a synthetic document generator that enhances model pre-training flexibility across languages and domains.

Key Features

  • OCR-free end-to-end Transformer model for document understanding
  • SynthDoG, a synthetic document generator for flexible pre-training
  • State-of-the-art performance in visual document classification and information extraction
  • Gradio web demos for interactive model testing

Use Cases

  • Researchers and developers working on visual document understanding tasks
  • Enterprises needing to automate document classification and information extraction
  • Educational institutions for teaching and research in document AI

Advantages

  • Removes dependency on OCR engines for document understanding
  • Achieves state-of-the-art performance without OCR
  • Flexible pre-training with SynthDoG for various languages and domains
  • Provides interactive web demos for easy model testing

Limitations / Considerations

  • May require significant computational resources for training and inference
  • Performance may vary across different document types and languages
  • The project is relatively new, and long-term community support and updates are yet to be established

Similar / Related Projects

  • Tesseract OCR: A popular open-source OCR engine, but requires OCR for document understanding tasks.
  • LayoutLM: A model for document understanding that focuses on document layout analysis, often used in conjunction with OCR.
  • DocVQA: A visual question answering model for documents, which may require OCR for certain tasks.

Basic Information


📊 Project Information

  • Project Name: donut
  • GitHub URL: https://github.com/clovaai/donut
  • Programming Language: Python
  • ⭐ Stars: 6,664
  • 🍴 Forks: 544
  • 📅 Created: 2022-07-20
  • 🔄 Last Updated: 2025-11-17

🏷️ Project Topics

Topics: [, ", c, o, m, p, u, t, e, r, -, v, i, s, i, o, n, ", ,, , ", d, o, c, u, m, e, n, t, -, a, i, ", ,, , ", e, c, c, v, -, 2, 0, 2, 2, ", ,, , ", m, u, l, t, i, m, o, d, a, l, -, p, r, e, -, t, r, a, i, n, e, d, -, m, o, d, e, l, ", ,, , ", n, l, p, ", ,, , ", o, c, r, ", ]


🎮 Online Demos

📚 Documentation


This article is automatically generated by AI based on GitHub project information and README content analysis

Titan AI Explorehttps://www.titanaiexplore.com/projects/donut-515792028en-USTechnology

Project Information

Created on 7/20/2022
Updated on 11/28/2025