MinerU is a powerful document parsing tool developed by OpenDataLab, designed to help users efficiently and accurately extract content from documents such as PDFs. It was born during the pre-training process of InternLM, aiming to solve the symbol conversion issues in scientific literature. Below is a detailed introduction to MinerU:

MinerU: A Document Parsing Tool That Makes Document Content Extraction Easy

In today’s fast-paced digital age, document processing has become indispensable in our work and study. Whether it is researchers handling academic papers, office workers organizing reports, or students consolidating study materials, document content extraction is a frequent task. However, traditional document parsing methods often fall short—format chaos, inaccurate content extraction, and inability to recognize complex elements pose significant challenges. MinerU, a document parsing tool, emerges as a game-changer, offering a new approach to document content extraction.

Project Overview

MinerU is a tool capable of converting PDFs into machine-readable formats such as markdown and JSON. It not only removes redundant elements like headers, footers, footnotes, and page numbers to ensure semantic coherence but also outputs text in a human-readable order, making it suitable for single-column, multi-column, and complex layouts. MinerU preserves the original document structure, including headings, paragraphs, and lists, while extracting images, image descriptions, tables, table titles, footnotes, and more. It can automatically recognize and convert formulas in documents to LaTeX format and tables to HTML format. It also detects scanned PDFs and garbled PDFs, enabling OCR functionality. MinerU’s OCR supports detection and recognition of 84 languages and is compatible with Windows, Linux, and Mac platforms.

Key Features

  • Accurate Content Extraction: MinerU effectively removes document redundancies such as headers, footers, footnotes, and page numbers, retaining only the core content to ensure semantic coherence. It supports various document layouts and outputs text in a natural reading order.
  • Rich Format Support: It supports parsing multiple input formats, including PDF, Word, PPT, and images, and can output content in formats like markdown, JSON, LaTeX, and HTML. This meets the needs of different users and downstream applications.
  • Powerful Element Recognition: MinerU can accurately identify and extract complex elements such as tables and formulas. For example, it can locate tables within documents and convert them into HTML format, making it easier for users to edit and analyze the data. It can also recognize formulas and convert them into LaTeX format, which is highly beneficial for academic writing and technical documentation.
  • Highly Efficient Performance: MinerU leverages advanced technologies to achieve fast document parsing. Its 2.0 version integrates a lightweight yet high-performance multimodal document parsing model. On a single NVIDIA 4090 card with sglang acceleration, it achieves a peak throughput exceeding 10,000 tokens/s, meeting large-scale document processing demands.
  • Flexible Deployment Options: MinerU supports deployment in various environments, including pure CPU setups and GPU/CUDA/NPU/CANN/MPS accelerated environments. It is compatible with Windows, Linux, and Mac platforms and provides Dockerfiles and detailed installation documentation for easy deployment.
  • User-Friendly: Most parameters can be configured directly via the command line or API, eliminating the need for manual JSON configuration file editing. It also offers an automatic model download and update mechanism, allowing users to complete model deployment without manual intervention. Additionally, it supports offline deployment, making it suitable for environments with limited internet access.

Quick Start

Online Demo

MinerU offers convenient online demos on multiple platforms. Users can directly access its demo on OpenDataLab, HuggingFace, or ModelScope to quickly experience its document parsing capabilities. These demos allow users to upload documents and view parsing results without installing any software, making it ideal for initial exploration of MinerU’s features.

Local Deployment

Installation

  • Install via pip or uv: First, ensure Python 3.10–3.13 is installed. Then run the following commands in the terminal or command prompt:
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"
  • Install from source: For users needing to modify or extend MinerU’s functionality, clone the source code from GitHub and install it:
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
uv pip install -e .[core]

Linux and macOS systems automatically enable CUDA/MPS acceleration after installation. Windows users wishing to use CUDA acceleration can refer to the PyTorch official website to install PyTorch with the appropriate CUDA version.

  • Install the full version (supports sglang acceleration): If your device has an Ampere or newer architecture and at least 24GB of GPU memory, you can install the full version to use sglang for accelerating VLM model inference:
uv pip install -U "mineru[all]"

Or install from source:

uv pip install -e .[all]

You can also build a Docker image using the Dockerfile:

wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/global/Dockerfile
docker build -t mineru-sglang:latest -f Dockerfile .

Start the Docker container:

docker run --gpus all \
  --shm-size 32g \
  -p 30000:30000 \
  --ipc=host \
  mineru-sglang:latest \
  mineru-sglang-server --host 0.0.0.0 --port 30000

Alternatively, use Docker Compose to start the container:

wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/compose.yaml
docker compose -f compose.yaml up -d
  • Install the client (for edge devices requiring only CPU and network connectivity):
uv pip install -U mineru
mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://<host_ip>:<port>

Usage

  • Command-line usage:

    • Basic usage: The simplest command-line invocation is:
mineru -p <input_path> -o <output_path>

<input_path> specifies the local PDF file or directory (supports pdf/png/jpg/jpeg formats), and <output_path> specifies the output directory.

* **View help information**: Run the following command to display all available parameter descriptions:
mineru --help
* **Parameter details**:
Usage: mineru [OPTIONS]

Options:
  -v, --version                   Show version and exit
  -p, --path PATH                 Input file path or directory (required)
  -o, --output PATH              Output directory (required)
  -m, --method [auto|txt|ocr]     Parsing method: auto (default), txt, ocr (pipeline backend only)
  -b, --backend [pipeline|vlm-transformers|vlm-sglang-engine|vlm-sglang-client]
                                  Parsing backend (default: pipeline)
  -l, --lang [ch|ch_server|... ]  Specify document language (improves OCR accuracy, pipeline backend only)
  -u, --url TEXT                  Service address when using sglang-client
  -s, --start INTEGER             Starting page number (0-based)
  -e, --end INTEGER               Ending page number (0-based)
  -f, --formula BOOLEAN           Enable formula parsing (default: on, pipeline backend only)
  -t, --table BOOLEAN             Enable table parsing (default: on, pipeline backend only)
  -d, --device TEXT               Inference device (e.g., cpu/cuda/cuda:0/npu/mps, pipeline backend only)
  --vram INTEGER                  Maximum GPU VRAM usage per process (pipeline backend only)
  --source [huggingface|modelscope|local]
                                  Model source, default: huggingface
  --help                          Show help information
  • Model source configuration: MinerU automatically downloads required models from HuggingFace during its first run. If HuggingFace is inaccessible, users can switch model sources. For example, to switch to ModelScope:
mineru -p <input_path> -o <output_path> --source modelscope

Or set the environment variable:

export MINERU_MODEL_SOURCE=modelscope
mineru -p <input_path> -o <output_path>

To use local models:

* **Download local models**: Run the following command to download models:
mineru-models-download --help

Or use the interactive command-line tool to select models:

mineru-models-download

After downloading, the terminal will display the model paths and automatically update the mineru.json file in the user directory.

* **Parse using local models**:
mineru -p <input_path> -o <output_path> --source local

Or set via environment variable:

export MINERU_MODEL_SOURCE=local
mineru -p <input_path> -o <output_path>
  • Using sglang to accelerate VLM model inference:

    • Through sglang-engine mode:
mineru -p <input_path> -o <output_path> -b vlm-sglang-engine
* **Through sglang-server/client mode**:

First, start the server:

mineru-sglang-server --port 30000

sglang-server provides several common parameters for configuration:

  • If you have two GPUs with 12GB or 16GB VRAM, you can use Tensor Parallel (TP) mode: --tp 2
  • For two GPUs with 11GB VRAM, in addition to TP mode, you should also reduce the KV cache size: --tp 2 --mem-fraction-static 0.7
  • If you have two or more GPUs with 24GB+ VRAM, you can enable sglang’s multi-GPU parallel mode to increase throughput: --dp 2
  • You can enable torch.compile to accelerate inference speed by approximately 15%: --enable-torch-compile
  • For more details on sglang parameter usage, refer to the official sglang documentation.

Then, use the client in another terminal:

mineru -p <input_path> -o <output_path> -b vlm-sglang-client -u http://127.0.0.1:30000

For more information on output files, refer to the Output File Documentation.

  • API usage: Users can also call MinerU via Python code. For example code, see the Python Usage Example.

  • Deploying derivative projects: Community developers have created various extensions based on MinerU, such as Gradio-based graphical interfaces, FastAPI-based Web APIs, client/server architectures with multi-GPU load balancing, and MCP Servers based on the official API. These projects offer enhanced user experiences and additional features. For deployment instructions, refer to the Derivative Projects Documentation.

Update History

MinerU has been continuously updated and improved since its inception. Below are some key updates:

  • 2025/06/20 – Release of 2.0.6: Fixed parsing interruptions caused by invalid block content and incomplete table structures in vlm mode.
  • 2025/06/17 – Release of 2.0.5: Resolved issues where models were still required in sglang-client mode and unnecessary dependencies on packages like torch during runtime. It also fixed the issue where only the first instance took effect when multiple sglang-client instances were launched via multiple URLs within the same process.
  • 2025/06/15 – Release of 2.0.3: Addressed configuration file key-value update errors when the download model type was set to all. It also fixed non-functional formula and table feature toggles in command line mode, which kept the features enabled. Additionally, it resolved compatibility issues with sglang version 0.4.7 in sglang-engine mode and updated the Dockerfile and installation documentation for deploying the full MinerU version in sglang environments.
  • 2025/06/13 – Release of 2.0.0: MinerU 2.0 marks a comprehensive reconstruction and upgrade in architecture and functionality. The new architecture deeply restructures code organization and interaction methods, enhancing system usability, maintainability, and extensibility. Key improvements include eliminating third-party dependency limitations, ready-to-use and easy configuration, automatic model management, offline deployment support, streamlined code structure, and unified intermediate format output. The new model integrates the latest lightweight, high-performance multimodal document parsing model, achieving end-to-end high-speed and high-precision document understanding with fewer parameters. It supports multilingual recognition, handwriting recognition, layout analysis, table parsing, formula recognition, reading order sorting, and other core tasks. Through sglang acceleration on a single NVIDIA 4090 card, it achieves a peak throughput exceeding 10,000 tokens/s. The release also includes online demos on HuggingFace for users to experience the model. However, note that this version contains some incompatible changes. Python package names were changed from magic-pdf to mineru, and the command-line tool was renamed from magic-pdf to mineru. Additionally, MinerU 2.0 no longer includes the LibreOffice document conversion module. If users need to process Office documents, they are recommended to convert them to PDF format using an independently deployed LibreOffice service before proceeding with parsing operations.

MinerU’s update history reflects its commitment to continuous improvement and optimization, evolving to meet user needs. Each update brings enhanced functionality, improved performance, and expanded applications, making MinerU increasingly powerful and practical.

Application Scenarios

MinerU finds wide application across various fields:

  • Academic Research: Researchers can use MinerU to quickly extract key information from a large volume of academic literature, such as text, formulas, and tables. This facilitates literature reviews and research analysis, enabling researchers to focus more on their core work and improving research efficiency.
  • Office Automation: Office workers can efficiently process documents like reports and contracts using MinerU. It removes redundant information, extracts core content while preserving document structure, and supports multiple output formats. This streamlines information organization and reporting workflows, helping office workers save time and enhance work efficiency.
  • Education: Teachers can leverage MinerU to extract content from teaching materials and create electronic lesson plans or online course resources. Students can use it to organize study materials and extract key knowledge points, aiding their learning. The tool’s formula recognition feature is particularly helpful for STEM students in managing complex mathematical and physical formulas.
  • Data Analysis: Data analysts can use MinerU to quickly extract tables, image descriptions, and other data-related elements from documents. By converting these elements into machine-readable formats like JSON, analysts can reduce data preprocessing time and focus more on in-depth data analysis, generating valuable insights.

Advantages and Limitations

Advantages

  • High Efficiency: MinerU can rapidly parse documents and accurately identify various elements, significantly reducing manual processing time and boosting work efficiency.
  • Rich Features: It supports multiple document formats and languages, with strong capabilities in formula and table recognition. This makes it suitable for diverse user needs and scenarios.
  • High Flexibility: MinerU offers various installation and deployment options, allowing users to choose the appropriate method based on their hardware conditions and requirements. Its command-line and API usage also make it easy to integrate into work processes.
  • Open-Source and Free: As an open-source project, MinerU is free to use, lowering document processing costs and fostering technology exchange and sharing.

Limitations

  • Limited Support for Complex Layouts: For extremely complex document layouts, MinerU may struggle to determine the correct reading order, leading to inaccuracies in content extraction for certain regions.
  • No Support for Vertical Text: MinerU cannot recognize and extract vertically oriented text.
  • Limited Recognition of Special Formats: Some uncommon list formats, code blocks, and special document types like comic books, art albums, elementary school textbooks, and exercise books may not be parsed effectively.
  • Potential Errors in Table and OCR Recognition: In complex tables, MinerU may misidentify rows and columns. Additionally, OCR recognition may produce errors for characters in less common languages (e.g., diacritics in Latin scripts or easily confused characters in Arabic scripts).

Future Development

MinerU’s future development plans focus on continuous optimization of its existing features and expansion into new areas. The development team aims to enhance MinerU’s parsing capabilities for complex documents, particularly improving its handling of special formats and vertical text. New features such as code block recognition, chemical formula recognition, and geometric shape recognition are also in the pipeline. With advancements in AI technology, MinerU is expected to integrate with more AI models and tools, enabling smarter document processing and analysis. This integration will provide users with a more comprehensive document intelligent processing solution, further enhancing MinerU’s practicality and competitiveness.

In conclusion, MinerU is a highly efficient and feature-rich document parsing tool that has gained widespread recognition and adoption across various fields. Its continuous updates and improvements demonstrate its strong development vitality and broad application prospects. Whether you are a researcher, office worker, teacher, or student, MinerU can help you handle document parsing tasks more efficiently and conveniently. By leveraging its powerful features, you can unlock the value of document content and enhance your work and study efficiency.