Logics-MLLM/Logics-Parsing-v2
Logics-MLLM β’ general
π» HomePageΒ Β | Β Β π€ GitHubΒ Β | Β Β π€ Demo
Updates
- [2026/02/13] πππππ We release Logics-Parsing-v2 Model.
- [2025/09/25] πππWe release Logics-Parsing Model.
Introduction
Logics-Parsing-v2 is an advanced evolution of the previously proposed Logics-Parsing (v1). It inherits all the core capabilities of v1 model, while demonstrating more powerful capabilities on handling complex documents. Furthermore, it extends support for Parsing-2.0 scenarios, enabling structured parsing of musical sheets, flowcharts, as well as code/pseudocode blocks.
Key Features
-
Effortless End-to-End Processing
- End-to-end recognition and parsing for various kinds of document elements within a single model.
- Handles complex-layout and text-dense documents such as newspapers and magazines with exceptional precision and ease;
-
Advanced Content Recognition
- Smaller in size, greater in performance, delivering more accurate and structured parsing of tables and scientific formulas.
- Introducing Parsing-2.0: natively supports parsing of diverse structured content, including flowcharts, music sheets and pseudocode blocks.
-
Rich, Structured HTML Output
- Transforms documents into concise HTML -- capturing not just content, but also element types, spatial layouts, and semantic hierarchy.
- More scientific and intuitive formats for structured elements -- such as Mermaid for flowcharts and ABC notation for musical scores.
-
State-of-the-Art Performance
- SOTA across the board: Logics-Parsing-v2 sets top records on both our in-house benchmark (overall score: 82.16) and the renowned public benchmark OmniDocBench-v1.5 (overall score: 93.23).
Benchmark
Comparisons on LogicsDocBench
We introduce LogicsDocBench, a new comprehensive evaluation benchmark comprising 900 carefully selected PDF pages, covering both traditional document Parsing-1.0 tasks and the newly introduced Parsing-2.0 scenarios. This benchmark is designed to better assess modelsβ capabilities in complex and diverse real-world documents parsing. The dataset is organized into three core document subsets:
-
STEM Documents (218 pages):
Focuses on high-difficulty academic and educational content, spanning over ten domains including physics, mathematics, engineering, and interdisciplinary sciences. This subset evaluates deep understanding of mathematical formulas, technical terminology, and structured knowledge representation.
-
Complex Layouts (459 pages):
Includes challenging real-world layouts such as multi-column text, cross-page tables, vertical writing, and mixed text-image arrangements. This subset comprehensively evaluate a modelβs layout analysis abilities.
-
Parsing-2.0 Content (223 pages):
Targets modern digital and semi-structured content that poses significant challenges for traditional OCR systems, including:
- Chemical Molecular formulas
- Musical sheets
- Code and pseudo-code block
- Flowcharts and mind maps
For Parsing-1.0 tasks, we adopt the same evaluation protocols as OmniDocBench-v1.5 to ensure fairness and consistency across benchmarks. For Parsing-2.0, we report fine-grained results using edit distance for each subcategory, and compute an overall score as follows:
$$\small \text{Overall} = \frac{Parsing1.0^{Overall} \times 3 + (1-{Chemistry}^{Edit})\times 100 + (1-{Code}^{Edit})\times 100 + (1-{Chart}^{Edit})\times 100 + (1-{Music}^{Edit})\times 100}{7}$$
Comprehensive evaluation of document parsing on LogicsDocBench is listed as follows:
The histogram below provides a more intuitive visualization of the advantages of our Logics-Parsing-v2 model in both Parsing-1.0 and 2.0 scenarios.
Comparisons on OmniDocBench_v1.5
We also provide the experimental results of our newly proposed Logics-Parsing-v2 model on the widely recognized open-source benchmark OmniDocBench-v1.5. As shown in the table below, Logics-Parsing-v2 achieves the highest scores among all other approaches, demonstrating its effectiveness and superiority.
* The model results in the table are sourced from the official OmniDocBench website.
Quick Start
1. Installation
conda create -n logis-parsing-v2 python=3.10
conda activate logis-parsing-v2
pip install -r requirements.txt
2. Download Model Weights
# Download our model from Modelscope.
pip install modelscope
python download_model_v2.py -t modelscope
# Download our model from huggingface.
pip install huggingface_hub
python download_model_v2.py -t huggingface
3. Inference
python3 inference_v2.py --image_path PATH_TO_INPUT_IMG --output_path PATH_TO_OUTPUT --model_path PATH_TO_MODEL
Showcases
Acknowledgments
We would like to acknowledge the following open-source projects that provided inspiration and reference for this work: