Project Title
Dolphin — A Multimodal Document Image Parsing Model via Heterogeneous Anchor Prompting
Overview
Dolphin is a novel multimodal document image parsing model that addresses the complexities of intertwined elements in document images, such as text paragraphs, figures, formulas, and tables. It employs a two-stage approach for comprehensive page-level layout analysis and efficient parallel parsing of document elements, achieving promising performance across diverse parsing tasks.
Key Features
- Comprehensive page-level layout analysis generating element sequence in natural reading order
- Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompts
- Lightweight architecture and parallel parsing mechanism for superior efficiency
Use Cases
- Academic and research institutions for document analysis and data extraction
- Enterprises for automating document processing and information retrieval
- Libraries and archives for digitizing and organizing large volumes of documents
Advantages
- Promising performance across page-level and element-level parsing tasks
- Superior efficiency through lightweight architecture and parallel parsing
- Pre-trained models and demo code available for quick implementation
Limitations / Considerations
- May require significant computational resources for training and inference
- Performance may vary depending on the complexity and quality of input document images
Similar / Related Projects
- LayoutLM: A model for document image understanding, differing in its approach to layout analysis and parsing.
- DocFormer: Another document image parsing model, focusing on transformer-based architectures.
- Fox: A dataset used for benchmarking document image parsing models, Dolphin uses a refined subset of this dataset.
Basic Information
- GitHub: https://github.com/bytedance/Dolphin
- Stars: 7,356
- License: MIT
- Last Commit: 2025-10-11
📊 Project Information
- Project Name: Dolphin
- GitHub URL: https://github.com/bytedance/Dolphin
- Programming Language: Python
- ⭐ Stars: 7,356
- 🍴 Forks: 592
- 📅 Created: 2025-05-13
- 🔄 Last Updated: 2025-10-11
🏷️ Project Topics
Topics: [, ", d, o, c, u, m, e, n, t, -, a, n, a, l, y, s, i, s, ", ,, , ", l, a, y, o, u, t, -, a, n, a, l, y, s, i, s, ", ,, , ", o, c, r, ", ,, , ", p, a, r, s, e, r, ", ,, , ", p, d, f, ", ,, , ", p, d, f, -, c, o, n, v, e, r, t, e, r, ", ,, , ", p, d, f, -, p, a, r, s, e, r, ", ,, , ", p, y, t, h, o, n, ", ,, , ", v, l, m, -, o, c, r, ", ]
🔗 Related Resource Links
🌐 Related Websites
This article is automatically generated by AI based on GitHub project information and README content analysis