Project Title

Dolphin — A Multimodal Document Image Parsing Model via Heterogeneous Anchor Prompting

Overview

Dolphin is a novel multimodal document image parsing model that addresses the complexities of intertwined elements in document images, such as text paragraphs, figures, formulas, and tables. It employs a two-stage approach for comprehensive page-level layout analysis and efficient parallel parsing of document elements, achieving promising performance across diverse parsing tasks.

Key Features

Comprehensive page-level layout analysis generating element sequence in natural reading order
Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompts
Lightweight architecture and parallel parsing mechanism for superior efficiency

Use Cases

Academic and research institutions for document analysis and data extraction
Enterprises for automating document processing and information retrieval
Libraries and archives for digitizing and organizing large volumes of documents

Advantages

Promising performance across page-level and element-level parsing tasks
Superior efficiency through lightweight architecture and parallel parsing
Pre-trained models and demo code available for quick implementation

Limitations / Considerations

May require significant computational resources for training and inference
Performance may vary depending on the complexity and quality of input document images

LayoutLM: A model for document image understanding, differing in its approach to layout analysis and parsing.
DocFormer: Another document image parsing model, focusing on transformer-based architectures.
Fox: A dataset used for benchmarking document image parsing models, Dolphin uses a refined subset of this dataset.

Basic Information

GitHub: https://github.com/bytedance/Dolphin
Stars: 7,356
License: MIT
Last Commit: 2025-10-11

📊 Project Information

Project Name: Dolphin
GitHub URL: https://github.com/bytedance/Dolphin
Programming Language: Python
⭐ Stars: 7,356
🍴 Forks: 592
📅 Created: 2025-05-13
🔄 Last Updated: 2025-10-11

🏷️ Project Topics

Topics: [, ", d, o, c, u, m, e, n, t, -, a, n, a, l, y, s, i, s, ", ,, , ", l, a, y, o, u, t, -, a, n, a, l, y, s, i, s, ", ,, , ", o, c, r, ", ,, , ", p, a, r, s, e, r, ", ,, , ", p, d, f, ", ,, , ", p, d, f, -, c, o, n, v, e, r, t, e, r, ", ,, , ", p, d, f, -, p, a, r, s, e, r, ", ,, , ", p, y, t, h, o, n, ", ,, , ", v, l, m, -, o, c, r, ", ]

This article is automatically generated by AI based on GitHub project information and README content analysis

Dolphin

Project Description

Project Title

Overview

Key Features

Use Cases

Advantages

Limitations / Considerations

Similar / Related Projects

Basic Information

📊 Project Information

🏷️ Project Topics

🔗 Related Resource Links

🌐 Related Websites

Project Information