Titan AI LogoTitan AI

Dolphin

7,701
630
Python

Project Description

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Dolphin: The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 202

Project Title

Dolphin — A Multimodal Document Image Parsing Model via Heterogeneous Anchor Prompting

Overview

Dolphin is a novel multimodal document image parsing model that addresses the complexities of intertwined elements in document images, such as text paragraphs, figures, formulas, and tables. It employs a two-stage approach for comprehensive page-level layout analysis and efficient parallel parsing of document elements, achieving promising performance across diverse parsing tasks.

Key Features

  • Comprehensive page-level layout analysis generating element sequence in natural reading order
  • Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompts
  • Lightweight architecture and parallel parsing mechanism for superior efficiency

Use Cases

  • Academic and research institutions for document analysis and data extraction
  • Enterprises for automating document processing and information retrieval
  • Libraries and archives for digitizing and organizing large volumes of documents

Advantages

  • Promising performance across page-level and element-level parsing tasks
  • Superior efficiency through lightweight architecture and parallel parsing
  • Pre-trained models and demo code available for quick implementation

Limitations / Considerations

  • May require significant computational resources for training and inference
  • Performance may vary depending on the complexity and quality of input document images

Similar / Related Projects

  • LayoutLM: A model for document image understanding, differing in its approach to layout analysis and parsing.
  • DocFormer: Another document image parsing model, focusing on transformer-based architectures.
  • Fox: A dataset used for benchmarking document image parsing models, Dolphin uses a refined subset of this dataset.

Basic Information


📊 Project Information

  • Project Name: Dolphin
  • GitHub URL: https://github.com/bytedance/Dolphin
  • Programming Language: Python
  • ⭐ Stars: 7,356
  • 🍴 Forks: 592
  • 📅 Created: 2025-05-13
  • 🔄 Last Updated: 2025-10-11

🏷️ Project Topics

Topics: [, ", d, o, c, u, m, e, n, t, -, a, n, a, l, y, s, i, s, ", ,, , ", l, a, y, o, u, t, -, a, n, a, l, y, s, i, s, ", ,, , ", o, c, r, ", ,, , ", p, a, r, s, e, r, ", ,, , ", p, d, f, ", ,, , ", p, d, f, -, c, o, n, v, e, r, t, e, r, ", ,, , ", p, d, f, -, p, a, r, s, e, r, ", ,, , ", p, y, t, h, o, n, ", ,, , ", v, l, m, -, o, c, r, ", ]



This article is automatically generated by AI based on GitHub project information and README content analysis

Titan AI Explorehttps://www.titanaiexplore.com/projects/dolphin-982697710en-USTechnology

Project Information

Created on 5/13/2025
Updated on 11/4/2025