Sparrow: Revolutionize Your Document Processing with AI-Powered Efficiency

In today’s fast-paced digital world, managing documents like invoices, receipts, bank statements, or complex tables can feel overwhelming. Whether you’re a business professional, a developer, or just someone buried in paperwork, extracting and organizing data often turns into a time-consuming chore. Imagine a tool that automates this process, making it faster, more accurate, and even enjoyable. Meet Sparrow, an open-source powerhouse that leverages machine learning (ML), large language models (LLM), and vision large language models (Vision LLM) to transform how you handle documents.

Sparrow isn’t just another document processor—it’s a versatile assistant that extracts structured data, processes text, validates information, and even tackles simple decision-making tasks. From a single invoice to a multi-page financial report, Sparrow streamlines it all. Plus, it’s designed to be user-friendly, so you don’t need to be a tech wizard to get started.

In this 3,000-word guide, we’ll explore everything Sparrow has to offer: its standout features, intuitive web interface, flexible architecture, and practical applications. You’ll also find step-by-step installation instructions, real-world examples, and pro tips to optimize performance. By the end, you’ll see why Sparrow is a game-changer for document processing—and how you can start using it today.

Document Processing
Photo by Andrew Neel on Pexels


Sparrow UI: Your Gateway to Effortless Document Processing

Sparrow’s web interface, Sparrow UI, is a game-changer for anyone who wants to process documents without diving into code. Its clean, intuitive design makes it accessible to everyone—whether you’re a seasoned developer or a complete beginner.

How to Access Sparrow UI

Ready to try it? Head over to sparrow.katanaml.io. Hosted on a Mac Mini M4 Pro, this online version delivers reliable performance and is available 24/7. No setup required—just jump right in!

Why Sparrow UI Shines


  • Drag-and-Drop Simplicity: Upload files like PNGs, JPGs, or PDFs by dragging them into the interface. No complicated steps—just pure convenience.

  • Instant Results: Once your file is uploaded, Sparrow processes it in real-time and displays the output immediately. No delays, no fuss.

  • JSON-Powered Queries: Need specific data? Define it with a JSON schema, and Sparrow will extract exactly what you want. It’s precise and flexible.

  • Structured Output: Results come in a neat JSON format, perfect for integration into other tools or workflows.

  • Visual Feedback: See bounding boxes around extracted data, giving you a clear view of where the information came from in your document.

Whether you’re handling a quick receipt or a stack of invoices, Sparrow UI makes the process smooth and efficient. It’s document processing made simple.


Sparrow’s Key Features: Power Meets Versatility

Sparrow isn’t just about ease of use—it’s packed with features that cater to both individual users and enterprise needs. Here’s what sets it apart:


  • Universal Document Processing: From invoices and receipts to bank statements and tables, Sparrow handles it all with ease.

  • Modular Architecture: Mix and match pipelines like Sparrow Parse, Instructor, or Agents to fit your specific task.

  • Multi-Backend Support: Works with Apple Silicon (MLX), Ollama, PyTorch, vLLM, or even Hugging Face Cloud GPU—choose what suits your setup.

  • Format Flexibility: Supports images (PNG, JPG) and multi-page PDFs, so you’re never limited by file type.

  • Automatic Validation: JSON schema ensures your extracted data is accurate and consistent every time.

  • API-First Approach: RESTful APIs make it a breeze to integrate Sparrow into your apps or workflows.

  • Instruction Calling: Go beyond extraction—process text, validate info, or perform calculations with ease.

  • Real-Time Monitoring: A built-in dashboard keeps you in the loop with live usage stats and performance insights.

  • Enterprise-Ready: Features like rate limiting and commercial licensing make it scalable for big organizations.

Sparrow’s robust capabilities make it a one-stop solution for all your document processing needs, whether you’re a solo user or part of a large team.


Inside Sparrow’s Architecture: A Modular Masterpiece

Sparrow’s strength lies in its smart, modular design. Let’s break down the core components that make it tick:


  • Sparrow ML LLM: The central engine driving document processing, powered by advanced AI models.

  • Sparrow Parse: A Vision LLM library that excels at extracting structured JSON data from documents.

  • Sparrow Agents: Manages complex workflows, perfect for multi-step tasks that need coordination.

  • Sparrow OCR: Preprocesses text from scanned images or PDFs for accurate extraction.

  • Sparrow UI: The web interface that ties it all together for a seamless user experience.

These components work in harmony but can also be used independently. Need to extract data from a simple form? Sparrow Parse has you covered. Tackling a multi-page report with varied content? Sparrow Agents can orchestrate the whole process. This flexibility is what makes Sparrow so powerful.

Sparrow Architecture
Photo by ThisIsEngineering on Pexels


Quickstart: Launch Sparrow in 30 Seconds

Want to see Sparrow in action? Here’s how to get it running fast.

What You’ll Need


  • Python 3.10.4+: Use pyenv to manage versions effortlessly.

  • Operating System: macOS (best for MLX), Linux, or Windows.

  • Hardware: GPU recommended for Vision LLM; CPU works for lighter tasks.

Installation Steps

  1. Set Up Python:

    pyenv install 3.10.4
    pyenv global 3.10.4
    
  2. Create a Virtual Environment:

    python -m venv .env_sparrow_parse
    source .env_sparrow_parse/bin/activate  # Linux/Mac
    # Windows: .env_sparrow_parse\Scripts\activate
    
  3. Install Sparrow Parse:

    git clone https://github.com/katanaml/sparrow.git
    cd sparrow/sparrow-ml/llm
    pip install -r requirements_sparrow_parse.txt
    
  4. Add Poppler (macOS):

    brew install poppler
    
  5. Launch the API:

    python api.py
    

Extract Your First Data

Got a bonds table image? Extract data like this:

./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --file-path "data/bonds_table.png"

Output:

{
  "data": [
    {"instrument_name": "UNITS BLACKROCK...", "valuation": 19049},
    {"instrument_name": "UNITS ISHARES...", "valuation": 83488}
  ],
  "valid": "true"
}

In under a minute, you’ve gone from setup to results. That’s the Sparrow speed!


Full Installation Guide: Set Up Sparrow Like a Pro

Need more details? Here’s the complete installation process for a rock-solid Sparrow setup.

Step 1: Clone the Repo

git clone https://github.com/katanaml/sparrow.git
cd sparrow

Step 2: Configure Python

Ensure Python 3.10.4 is active:

pyenv install 3.10.4
pyenv global 3.10.4

Step 3: Virtual Environments

Set up separate environments for different pipelines:


  • Sparrow Parse: .env_sparrow_parse

  • Instructor: .env_instructor

  • OCR: .env_ocr (optional)

Example:

python -m venv .env_sparrow_parse
source .env_sparrow_parse/bin/activate

Step 4: Install Dependencies

For Sparrow Parse:

cd sparrow-ml/llm
pip install -r requirements_sparrow_parse.txt

Step 5: System Dependencies


  • macOS:

    brew install poppler
    

  • Ubuntu/Debian:

    sudo apt-get install poppler-utils libpoppler-cpp-dev
    

Platform Tips


  • Apple Silicon: Use MLX for top performance.

  • NVIDIA GPU: Local_gpu or Ollama (in progress).

  • CPU Only: Stick to small models or cloud backends.

Step 6: Verify It Works

Run the API:

python api.py --port 8002

Check http://localhost:8002/api/v1/sparrow-llm/docs. If the docs load, you’re good to go!


Sparrow in Action: 5 Real-World Examples

Let’s see how Sparrow tackles common document processing tasks.

1. Bank Statement Extraction

Extract everything from a bank statement PDF:

./sparrow.sh "*" \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --file-path "data/bank_statement.pdf"

Output:

{
  "bank": "First Platypus Bank",
  "account_holder": "Mary G. Orta",
  "transactions": [
    {"date": "02/01", "description": "PGD EasyPay Debit", "withdrawal": "203.24"}
  ],
  "valid": "true"
}

Bank Statement
Photo by CardMapr.nl on Unsplash

2. Financial Table Data

Pull data from a bonds table image:

./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --file-path "data/bonds_table.png"

Output:

{
  "data": [
    {"instrument_name": "UNITS BLACKROCK...", "valuation": 19049}
  ],
  "valid": "true"
}

3. Invoice Processing

Improve accuracy with cropping:

./sparrow.sh "*" \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --crop-size 60 \
  --file-path "data/invoice.pdf"

Output:

{
  "invoice_number": "61356291",
  "seller": {"name": "Chapman, Kim and Green"},
  "items": [
    {"description": "Wine Glasses", "quantity": 5, "net_price": 12.0}
  ]
}

4. Multi-Page PDF Tables

Extract tables from a financial report:

./sparrow.sh '{"table": [{"description": "str", "latest_amount": 0, "previous_amount": 0}]}' \
  --pipeline "sparrow-parse" \
  --file-path "data/financial_report.pdf"

Output:

[
  {
    "table": [
      {"description": "Revenues", "latest_amount": 12453, "previous_amount": 11445}
    ],
    "page": 1
  }
]

5. Simple Calculations

Perform a quick math task:

./sparrow.sh "instruction: do arithmetic operation, payload: 2+2=" \
  --pipeline "sparrow-instructor" \
  --options mlx \
  --options mlx-community/Mistral-Small-3.1-24B-Instruct-2503-8bit

Output:

The result of 2 + 2 is: 4

These examples show Sparrow’s versatility—handling everything from structured data to text instructions with ease.


Command Line Mastery: Unlock Sparrow’s Full Potential

Sparrow UI is great for quick tasks, but the CLI offers unmatched control. Here’s how to use it.

Basic Syntax

./sparrow.sh "<JSON_SCHEMA>" --pipeline "<PIPELINE>" [OPTIONS] --file-path "<FILE>"

Key Parameters

Parameter Type Description Example
query JSON/String Schema or instruction '[{"field":"str"}]'
--pipeline String Processing pipeline sparrow-parse
--file-path Path Input file path data/invoice.pdf
--options String Backend settings mlx,model-name
--crop-size Integer Crop border pixels 60

Advanced Examples


  • Multi-Page PDF:

    ./sparrow.sh "*" --page-type invoice --page-type table --pipeline "sparrow-parse" --file-path "multi_page.pdf"
    

  • Table Extraction:

    ./sparrow.sh '*' --options tables_only --crop-size 100 --file-path "scan.pdf"
    

The CLI lets you fine-tune Sparrow for any task, big or small.


API Integration: Bring Sparrow into Your Projects

Developers will love Sparrow’s RESTful API for seamless integration.

Start the Server

python api.py --port 8002

Key Endpoints


  • Extract Data:

    curl -X POST 'http://localhost:8002/api/v1/sparrow-llm/inference' \
      -F 'query=[{"field_name":"str", "amount":0}]' \
      -F 'pipeline=sparrow-parse' \
      -F 'file=@document.pdf'
    

  • Process Instructions:

    curl -X POST 'http://localhost:8002/api/v1/sparrow-llm/instruction-inference' \
      -d 'query=instruction: analyze data, payload: {...}' \
      -d 'pipeline=sparrow-instructor'
    

Explore all endpoints at http://localhost:8002/api/v1/sparrow-llm/docs.


Sparrow Agents: Tackle Complex Workflows

For multi-step tasks, Sparrow Agents orchestrate everything effortlessly.

Example: Medical Prescriptions

curl -X POST 'http://localhost:8001/api/v1/sparrow-agents/execute/file' \
  -F 'agent_name=medical_prescriptions' \
  -F 'file=@prescription.pdf'

With real-time monitoring and error recovery, Agents are perfect for enterprise use.


Sparrow Dashboard: Stay in Control

Monitor everything via the dashboard at sparrow.katanaml.io:


  • API call stats

  • Model performance

  • Usage analytics

Dashboard
Photo by Lukas on Pexels


Pipeline Comparison: Find Your Perfect Fit

Feature Sparrow Parse Sparrow Instructor Sparrow Agents
Input Documents + JSON Text instructions Complex workflows
Output Structured JSON Free text Multi-step results
Use Case Data extraction Text analysis Enterprise tasks

Pick the pipeline that matches your needs—Parse for simplicity, Agents for complexity.


Performance Tips: Optimize Sparrow

Hardware Choices


  • Apple Silicon: MLX backend for efficiency.

  • NVIDIA GPU: 12GB+ VRAM recommended.

  • CPU: Use small models or cloud options.

Memory Tricks

--crop-size 100  # Reduce image size
--options tables_only  # Focus on tables

Model Picks

Use Case Model Memory Speed
Invoices Mistral-Small-3.1-24B 35GB Fast
Tables Qwen2.5-VL-72B 50GB Slower
Testing Qwen2.5-VL-7B 20GB Fastest

Troubleshooting Made Easy


  • Python Issues:

    pyenv install 3.10.4
    

  • Poppler Missing:

    brew install poppler  # macOS
    

  • Memory Low: Use smaller models or crop images.

Help is at GitHub Issues or abaranovskis@redsamuraiconsulting.com.


Licensing: Open Source and Beyond

Sparrow is GPL 3.0—free for revenue under $5M. Enterprises can email abaranovskis@redsamuraiconsulting.com for commercial options.


Conclusion: Sparrow’s Your Document Processing Ally

Sparrow blends power, flexibility, and ease into one incredible tool. From quick data extraction to complex workflows, its AI-driven approach saves time and effort. Open-source and enterprise-ready, it’s perfect for everyone. Visit GitHub, give it a star, and start processing smarter today!

Efficiency
Photo by Austin Distel on Unsplash