NuMarkdown-8B-Thinking: Making Document Conversion Smarter and Easier

Have you ever tried to turn a scanned document into something you can edit on your computer, only to find it’s a mess because of tables or weird layouts? Maybe it’s an old textbook, a work contract, or a report with lists and charts that just won’t cooperate with regular tools. It’s frustrating, right? That’s where NuMarkdown-8B-Thinking comes in—a smart tool that converts documents into neat, easy-to-use Markdown files, even when they’re tricky to handle. In this blog, we’ll walk you through what this tool is, how it works, why it’s so good at what it does, and how you can use it yourself. By the end, you’ll see how it can save you time and make your life easier.

What Is NuMarkdown-8B-Thinking?

NuMarkdown-8B-Thinking is an AI-powered tool designed to take documents—like scanned PDFs or images—and turn them into Markdown format. If you’re not familiar, Markdown is a simple way to write and format text using basic symbols, like asterisks (*) for bold or hashtags (#) for headings. It’s popular because it’s easy to read, edit, and use for things like notes, websites, or reports.

This tool doesn’t just copy text like a basic scanner. It uses advanced technology to “read” the document, figure out its structure—like where the titles, paragraphs, or tables are—and then organizes everything into a clean Markdown file. It’s built on another model called Qwen 2.5-VL-7B, but it’s been specially trained to tackle document layouts that would stump most other tools.

Why Is It Useful?

Imagine you’ve got a pile of paper documents you need to digitize. Regular scanning tools might give you a jumbled mess, especially if there are tables or bullet points. NuMarkdown-8B-Thinking steps in to solve that. Here’s what makes it stand out:

  • Handles Complex Layouts: It can spot headers, lists, and tables, not just plain text.
  • Great with Tables: Even if a table has merged cells or multiple lines, it gets it right.
  • Saves Time: You get a ready-to-use Markdown file without hours of fixing mistakes.

Whether you’re a student digitizing notes, a researcher organizing papers, or a professional archiving files, this tool can make the process smoother.

How Does NuMarkdown-8B-Thinking Work?

The magic behind NuMarkdown-8B-Thinking lies in its two-step process. It’s like having a careful assistant who studies your document before typing it up. Here’s how it goes:

  1. Thinking Stage
    First, the tool looks at the document and makes internal notes about what it sees. These notes, called “thinking tokens,” are like a map of the layout. It figures out things like, “This is a title at the top, that’s a table in the middle, and this is a list below it.” This step helps it understand the document before doing anything else.

  2. Conversion Stage
    Next, it uses those notes to create the Markdown file. Because it’s already planned everything out, the output is accurate and well-organized.

This “think first, act later” approach is what sets it apart. Depending on how complicated the document is, it might spend more or less time thinking. For a simple page, it’s quick. For something packed with tables and sections, it takes a little longer to get it perfect.

A Simple Analogy

Think of it like baking a cake. You don’t just throw ingredients together and hope for the best. First, you read the recipe, measure everything out, and plan your steps. That’s the “thinking” part. Then, you mix and bake—the “conversion” part. By planning ahead, NuMarkdown-8B-Thinking avoids the mess and delivers a polished result.

How Was It Built?

You might wonder how this tool got so smart. It went through two training phases to learn its skills:

  1. Learning the Basics
    The team started by feeding it a huge collection of sample documents, created from public PDFs. These included all kinds of layouts—simple pages, detailed reports, and everything in between. This step, called supervised fine-tuning, was like teaching it the rules of document conversion.

  2. Perfecting the Details
    After that, it got extra training using a method called GRPO, which stands for a type of reinforcement learning. This focused on tricky layouts, like tables with odd shapes or merged cells, to make sure it could handle real-world challenges.

By the end, NuMarkdown-8B-Thinking was ready to take on almost any document you throw at it.

How Good Is It? Performance Breakdown

This isn’t just talk—NuMarkdown-8B-Thinking has been tested against other tools and proven its worth. It’s gone head-to-head with big names like GPT-4o (a powerful AI model) and OCRFlux (a specialized scanning tool), and it often comes out ahead, especially with complex documents.

The Rankings

In a competition called Arena, it earned a high score based on votes from about 500 people. Here’s how it stacked up:

Rank Model Score
1 gemini-flash-reasoning 26.75
2 NuMarkdown-reasoning 26.10
3 NuMarkdown-reasoning-w/o_grpo 25.32
4 OCRFlux-3B 24.63
5 gpt-4o 24.48
6 gemini-flash-w/o_reasoning 24.11
7 RolmoOCR 23.53

With a score of 26.10, it took second place, just behind gemini-flash-reasoning. That’s impressive for a tool focused specifically on document conversion!

Winning Matchups

When pitted directly against other models using only images, NuMarkdown-8B-Thinking won most of the time. It especially outperformed OCRFlux-3B and GPT-4o, showing it’s better at turning messy documents into clean Markdown files. Picture a bar chart with green bars for wins towering over tiny red bars for losses—that’s what its performance looks like.

These results mean you can trust it to handle your documents accurately, even when other tools might stumble.

Seeing It in Action: A Real Example

Let’s look at how NuMarkdown-8B-Thinking works with a real document. The team tested it on a two-page educational file from a school in Catalonia, Spain. The document had:

  • Headers with the school’s name and department.
  • Bullet points listing instructions.
  • Tables listing books, authors, and publishers.
  • Footers with page numbers and extra details.

The model processed this and turned it into a Markdown file that captured everything perfectly. Here’s a shortened version of what it produced:

### Generalitat de Catalunya
### Departament d'Educació
### Institut Gal·lecs

### Curs 2021-22

- Els llibres de color blau indiquen que es manté respecte al curs anterior.
- Els llibres de color groc indiquen que es tracta d'un canvi per aquest curs.
- Els llibres de color vermell indiquen que no s'han de comprar perquè van a càrrec del centre.

# 1 ESO

| MATERIAL          | TÍTOL                   | AUTOR            | EDITORIAL         | ISBN          |
|-------------------|-------------------------|------------------|-------------------|---------------|
| Llengua Catalana  | Punt Volat              |                  | Castellnou        | 9788417803124 |
| Llengua Castellana| Proyecto Asterisco      |                  | Castellnou        | 9788417803186 |
| Anglès            | Think Ahead ESO 1       |                  | Burlington Books  | 9788925300662 |

Codí: 04mp02  
Pàgina 1 de 2

The full output included more tables and details, but you get the idea. It kept the structure intact, formatted the tables correctly, and made the text easy to read and edit. That’s the kind of result you can expect.

How to Use NuMarkdown-8B-Thinking Yourself

Ready to try it? You don’t need to be a tech expert—there are two straightforward ways to use it: with vLLM or the Transformers library. Both work on a computer with Python installed, and a graphics card (GPU) will make things faster. Let’s break it down.

Option 1: Using vLLM

vLLM is a tool that makes running models like this quick and simple. Here’s what to do:

  1. Install vLLM
    Open your computer’s command line (like Terminal or Command Prompt) and type:

    pip install vllm
    
  2. Start the Service
    Run this command to get the model ready:

    vllm serve numind/NuMarkdown-8B-Thinking --trust_remote_code --limit-mm-per-prompt image=1
    
  3. Process Your Document
    Use this Python script to send your document image and get the Markdown back. Replace “image.png” with your file’s name:

    from openai import OpenAI
    import base64
    
    openai_api_key = "EMPTY"
    openai_api_base = "http://localhost:8000/v1"
    
    client = OpenAI(
        api_key=openai_api_key,
        base_url=openai_api_base,
    )
    
    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    
    base64_image = encode_image("image.png")
    data_url = f"data:image/jpeg;base64,{base64_image}"
    
    chat_response = client.chat.completions.create(
        model="numind/NuMarkdown-8B-Thinking",
        temperature=0.7,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": data_url},
                        "min_pixels": 100 * 28 * 28,
                        "max_pixels": 5000 * 28 * 28,
                    },
                ],
            },
        ]
    )
    
    result = chat_response.choices[0].message.content
    reasoning = result.split("<think>")[1].split("</think>")[0]
    answer = result.split("<answer>")[1].split("</answer>")[0]
    print(answer)
    

    When you run this, it’ll print the Markdown text for your document.

Option 2: Using Transformers Library

If you like more control, use the Transformers library from Hugging Face. Here’s how:

  1. Install the Tools
    In your command line, type:

    pip install transformers torch
    
  2. Load the Model
    Use this Python code to set it up:

    import torch
    from PIL import Image
    from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
    
    model_id = "numind/NuMarkdown-8B-Thinking"
    
    processor = AutoProcessor.from_pretrained(
        model_id,
        trust_remote_code=True,
        min_pixels=100*28*28, max_pixels=5000*28*28
    )
    
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16,
        attn_implementation="flash_attention_2",
        device_map="auto",
        trust_remote_code=True,
    )
    
  3. Prepare Your Document
    Add this to load your image (replace “image.png”):

    img = Image.open("image.png").convert("RGB")
    messages = [{
        "role": "user",
        "content": [
            {"type": "image"},
        ],
    }]
    prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_input = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
    
  4. Get the Markdown
    Run this to process it:

    with torch.no_grad():
        model_output = model.generate(**model_input, temperature=0.7, max_new_tokens=5000)
    
    result = processor.decode(model_output[0])
    reasoning = result.split("<think>")[1].split("</think>")[0]
    answer = result.split("<answer>")[1].split("</answer>")[0]
    print(answer)
    

Both methods give you the same result—a clean Markdown file. vLLM is simpler if you just want to get started, while Transformers lets you tweak things more.

Tips for Best Results

  • Image Quality: Use clear scans or photos. Fuzzy images might confuse it.
  • One Page at a Time: It processes single images, so split multi-page files first.
  • Adjusting Output: The “temperature” setting (set to 0.7 above) controls consistency. Lower it for stricter results, or raise it for more flexibility.

Who Can Use It?

This tool is handy for all kinds of people:

  • Students: Turn textbook scans into notes you can edit or share.
  • Researchers: Digitize papers or articles for easier study.
  • Businesses: Convert contracts, manuals, or reports into digital files.

It’s especially helpful if you deal with documents that have tables, lists, or other structured parts that regular tools struggle with.

What Makes It Different?

You might already use scanning apps or OCR tools, so why try this? Here’s how it compares:

  • Vs. Basic OCR: Regular OCR just grabs text, often losing the layout. NuMarkdown-8B-Thinking keeps everything organized.
  • Vs. GPT-4o: While GPT-4o is great for general tasks, this model is built for documents and beats it in layout accuracy.
  • Vs. OCRFlux: Even against specialized tools, it wins with complex tables and structures.

Its focus on Markdown output also makes it unique—perfect if you want files that are easy to edit or use online.

Limitations to Know

No tool is perfect, and NuMarkdown-8B-Thinking has a few quirks:

  • Single Images Only: It can’t process multi-page PDFs in one go—you’ll need to split them first.
  • Best with Typed Text: Handwritten notes or low-quality scans might not work as well.
  • Setup Needed: You’ll need some basic tech skills to install and run it.

Even with these, it’s still a powerful option for most document tasks.

Expanding on Its Uses

Let’s dig deeper into how you could use this in real life. Say you’re a student with a stack of old class handouts. You scan them, run them through NuMarkdown-8B-Thinking, and suddenly you’ve got Markdown files you can search, edit, or share with friends. Or maybe you work in an office with piles of contracts. This tool can turn them into digital files ready for archiving or emailing, without losing the tables or fine print.

For researchers, it’s a game-changer. Imagine digitizing a stack of academic papers full of charts and references. The model keeps all that structure intact, so you can focus on your work instead of fixing formatting.

Example Scenario

Picture this: You’re a teacher with a curriculum guide that’s only on paper. It’s got sections for each grade, lists of goals, and tables of resources. You scan it, run it through the model, and get a Markdown file like this:

# Curriculum Guide 2023

## Grade 1

- Learn basic math skills.
- Read simple stories.

### Resources

| Subject   | Book Title         | Publisher     |
|-----------|--------------------|---------------|
| Math      | Numbers Made Easy  | EduPress      |
| Reading   | First Tales        | StoryBooks    |

Now you can update it, share it online, or turn it into a webpage—all without retyping everything.

Digging Deeper: How It Handles Tables

Tables are where this tool really shines. Let’s say your document has a table like this on paper:

Subject       | Book Title          | Publisher
--------------|---------------------|------------
Math          | Numbers Made Easy   | EduPress
Reading       | First Tales         | StoryBooks
Science       | Exploring Nature    | SciWorld

Some tools might mash it into a blob of text, but NuMarkdown-8B-Thinking turns it into:

| Subject   | Book Title         | Publisher  |
|-----------|--------------------|------------|
| Math      | Numbers Made Easy  | EduPress   |
| Reading   | First Tales        | StoryBooks |
| Science   | Exploring Nature   | SciWorld   |

Even if the table has merged cells—like a title spanning two columns—it figures that out too. That’s thanks to its thinking stage, which maps the layout before writing anything.

Why Markdown Matters

You might wonder why it outputs Markdown instead of, say, Word or PDF. Markdown is lightweight and flexible. You can:

  • Edit it in any text editor.
  • Convert it to HTML for websites.
  • Use it in apps like Notion or Obsidian for notes.

It’s a format that’s both human-friendly and machine-readable, which is why it’s so popular for digital content.

Getting Comfortable with the Setup

If the setup sounds intimidating, don’t worry—it’s easier than it looks. Here’s a step-by-step to build your confidence:

  1. Check Your Computer: You’ll need Python installed (search “install Python” if you don’t have it). A GPU helps but isn’t required.
  2. Pick a Method: vLLM is faster to start with; Transformers is better if you like tinkering.
  3. Test with a Small File: Scan a simple page—like a recipe or a flyer—and try it out. Seeing it work will make it click.

Once you’ve done it once, it’s a breeze the next time.

Comparing the Two Methods

Still unsure which method to use? Here’s a quick breakdown:

  • vLLM

    • Pros: Fast setup, good for quick tasks, runs as a service you can reuse.
    • Cons: Less control over the process.
  • Transformers

    • Pros: More options to customize, great for learning how it works.
    • Cons: Takes more steps to set up.

If you’re just starting, go with vLLM. If you’re curious about the tech, try Transformers.

Real-World Impact

Let’s zoom out and think about what this means. Documents are everywhere—schools, offices, libraries—and digitizing them is a growing need. NuMarkdown-8B-Thinking makes that faster and more accurate, especially for the tough stuff. It’s not just a tool; it’s a way to bridge the gap between paper and digital, saving hours of manual work.

For example, a small business could scan old invoices, run them through this, and have searchable records in minutes. A librarian could digitize rare books without losing their structure. The possibilities are wide open.

Wrapping Up

NuMarkdown-8B-Thinking is a practical, powerful solution for anyone who needs to convert documents into digital form. Its ability to understand layouts, handle tables, and output clean Markdown sets it apart from the crowd. Whether you’re dealing with a single page or a stack of files, it’s worth a try.

If you’re ready to dive in, grab your scanner, pick a method, and start experimenting. You’ll find it’s not just about saving time—it’s about making your documents work for you. Check out the Hugging Face page for more details, or join the Discord community to swap tips with others. Got questions? I’m happy to help—just reach out!