Stop Copy-Pasting: Turn Italian Post Office PDFs into Clean JSON or CSV in One Command

Italian post office envelopes next to a laptop

If you study, work, or simply live in Italy, you know the monthly ritual: log in to Poste Italiane, download a PDF statement, and then spend the better part of an afternoon copying numbers into a spreadsheet.
This post shows you how to replace that ritual with a single, repeatable command. We will use the open-source utility Poste Italiane Documents Parser to extract, validate, and export every balance, transaction, and personal detail into JSON or CSV—ready for Excel, Pandas, or any other tool you already use.


Table of contents

  1. Why structured data beats PDF tables
  2. What the parser does (and what it does not)
  3. Supported document types in plain English
  4. One-command installation on macOS, Windows, or Linux
  5. From your first file to a whole folder: hands-on examples
  6. Using the parser as a Python library in your own scripts
  7. Anatomy of the output: every field explained
  8. Writing reliable tests without leaking personal data
  9. Common pitfalls and how to avoid them
  10. Ideas for what to build next

1. Why structured data beats PDF tables

Pain point Manual workflow After the parser
Time 30–60 min per statement 2–3 s
Typos Decimal errors, date mix-ups Automatic validation
Re-use Locked in PDF Ready for Excel, Power BI, Google Sheets
Automation Impossible Cron job or GitHub Action

The Italian Post Office does not expose an official API; the PDF is your only data source. The parser fills that gap without storing or transmitting your data anywhere.


2. What the parser does (and what it does not)

What it does

  • Detects the document type automatically (current account, prepaid card, etc.)
  • Validates that opening balance + credits – debits = closing balance
  • Converts one file or an entire folder
  • Exports to JSON (default) or CSV with a flag
  • Runs offline—no cloud calls, no registration

What it does not

  • Handle password-protected PDFs
  • Categorize merchants (e.g., groceries vs. utilities)
  • Support PosteMobile phone bills (yet)

3. Supported document types in plain English

Italian name What it is English equivalent
Estratto Conto BancoPosta Monthly bank statement Checking account statement
Rendiconto Postepay Evolution Monthly prepaid card summary Prepaid card statement
Lista Movimenti Postepay Evolution Detailed transaction list Transaction history

If you download statements from the Poste Italiane online archive, the filenames look like:

EstrattoConto_0001234567_20250731.pdf
Rendiconto_PP_1234567890_202507.pdf

The parser recognizes these patterns internally, so you never have to tell it “this is a Postepay statement.”


Three PDF icons next to their JSON equivalents

4. One-command installation on macOS, Windows, or Linux

Prerequisites

  • Python 3.8 or higher
  • pip (comes with most Python installers)

Step-by-step:

# 1. Clone the repository
git clone https://github.com/genbs/poste-italiane-parser.git
cd poste-italiane-parser

# 2. Install the dependencies
pip install -r requirements.txt

That is it.
The requirements.txt file lists only two main libraries—pdfplumber to read tables and pydantic to validate the extracted data—so installation usually finishes in under a minute even on modest hardware.


5. From your first file to a whole folder: hands-on examples

5.1 Save your PDFs

Download statements from the Poste Italiane portal and save them in any folder, e.g.:

~/Documents/poste_pdfs/
├── statement_july2025.pdf
├── postepay_june2025.pdf
└── misc/
    └── older_statement.pdf

5.2 Convert a single file to JSON (default)

python main.py --path "~/Documents/poste_pdfs/statement_july2025.pdf"

A file named statement_july2025.json appears next to the original PDF.
Open it in any editor; you will see a clear hierarchy without nested arrays of arrays.

5.3 Convert an entire folder to CSV

python main.py "~/Documents/poste_pdfs" \
               --format csv \
               --output ~/Documents/poste_results \
               --verbose
  • --output can be a folder; the script writes one CSV per source PDF.
  • --verbose prints lines such as:
INFO: Detected ESTRATTO_CONTO
INFO: Balance check OK
Terminal showing the verbose output

6. Using the parser as a Python library in your own scripts

If you already have a Flask API, Jupyter notebook, or scheduled job, import the parser directly:

from poste_italiane_parser import PosteItalianeParser

file_path = "rendiconto_postepay.pdf"

try:
    data = PosteItalianeParser(file_path)
    print("Document type:", data['document_type'])
    print("Holder:", data['holder'])
    print("Closing balance:", data['final_balance'])
except ValueError as e:
    print("Validation failed:", e)
except FileNotFoundError:
    print("File not found:", file_path)

Returned data is a plain Python dictionary. Feed it straight into pandas:

import pandas as pd

df = pd.DataFrame(data['transactions'])
df['value_date'] = pd.to_datetime(df['value_date'])
monthly_net = df.groupby(df['value_date'].dt.to_period('M'))['value'].sum()
print(monthly_net)

7. Anatomy of the output: every field explained

Below is the complete JSON schema for an Estratto Conto BancoPosta.
The same keys appear for Postepay documents; missing fields are set to null.

{
  "generated_at": "2025-07-24 10:30:00",
  "document_type": "ESTRATTO_CONTO",
  "currency": "EUR",
  "initial_balance": 1234.56,
  "final_balance": 987.65,
  "iban": "IT60X076011240000001234567890",
  "holder": "Mario Rossi",
  "card_number": null,
  "account_number": "000001234567",
  "period": {
    "start_date": "2025-07-01",
    "end_date": "2025-07-31"
  },
  "customer": {
    "name": "Mario Rossi",
    "street": "Via Roma 1",
    "city": "Milano"
  },
  "transactions": [
    {
      "accounting_date": "2025-07-05 00:00:00",
      "value_date": "2025-07-05 00:00:00",
      "description": "POS 1234567890123456 AMAZON EU",
      "debits": 49.99,
      "credits": 0.0,
      "value": -49.99
    },
    {
      "accounting_date": "2025-07-10 00:00:00",
      "value_date": "2025-07-10 00:00:00",
      "description": "ACCREDITO STIPENDIO",
      "debits": 0.0,
      "credits": 1500.0,
      "value": 1500.0
    }
  ]
}

Field-by-field glossary

Key Meaning Always present?
generated_at Timestamp when the parser ran Yes
document_type ENUM: ESTRATTO_CONTO, LISTA_MOVIMENTI, or RENDICONTO Yes
currency Three-letter code, always EUR for Italian accounts Yes
initial_balance Opening balance for the period Yes (null for some Postepay)
final_balance Closing balance Yes
iban International Bank Account Number Only for BancoPosta
holder Legal account holder Yes
card_number Last 4 digits of Postepay card Only for Postepay
account_number Internal Poste Italiane account ID Yes
period Object with start_date and end_date Yes
customer Name and address as registered Yes
transactions Array of transaction objects Yes

Each transaction object contains:

Sub-key Type Sign convention
accounting_date datetime
value_date datetime
description string Raw text from PDF
debits float Positive number, money leaving the account
credits float Positive number, money entering the account
value float Signed total: credits - debits

JSON schema visualized

8. Writing reliable tests without leaking personal data

The repository does not include real PDFs for privacy reasons. Instead, tests rely on “golden” JSON files that describe the expected output.

8.1 Create a test file

Inside the tests/ folder, create a file named my_case.test.json:

{
  "path": "tests/sample_statement.pdf",
  "currency": "EUR",
  "holder": "Giuseppe Verdi",
  "initial_balance": 100.00,
  "final_balance": 200.00,
  "iban": "IT60X076011240000001234567890",
  "period_start_date": "2025-07-01",
  "period_end_date": "2025-07-31",
  "transactions": [
    {
      "accounting_date": "2025-07-03 00:00:00",
      "value_date": "2025-07-03 00:00:00",
      "description": "BOLLETTINO 12345",
      "credits": 100.00,
      "debits": 0.00
    }
  ]
}
  • You may include all or only some transactions.
  • Extra keys in actual output are ignored, so you can keep the test minimal.

8.2 Run the test suite

python -m unittest tests/test_PosteItalianeParser.py -v

If any field mismatches, the test runner prints the exact difference:

FAIL: test_my_case
AssertionError: final_balance expected 200.00, got 199.98

Update the PDF or the JSON until the test passes; your CI pipeline will now guard against regressions.


9. Common pitfalls and how to avoid them

Symptom Cause Fix
ValueError: Unknown document type PDF is scanned image, not text Re-download from Poste Italiane and choose “Testo” (Text) instead of “Immagine” (Image)
Balance check failed Currency rounding or duplicate lines Open an issue with the PDF (first 3 pages) attached
File not found on Windows Spaces in path Wrap the path in double quotes: --path "C:\My Files\poste"
CSV shows strange characters Default encoding on Windows Open the CSV in Excel → Data → From Text/CSV → choose UTF-8

10. Ideas for what to build next

  • Monthly cron job
    Combine wget or selenium to download PDFs, then run the parser and upload results to Google Drive.

  • Personal finance dashboard
    Feed the JSON into Grafana or Google Data Studio for live spending charts.

  • Expense reimbursement bot
    Employees drop Postepay PDFs into a shared folder; the bot returns categorized CSV for the accounting team.

  • Banking bridge
    Use the library inside a Django REST API so other apps can query Italian account data without storing credentials.


Future automation pipeline

Closing thoughts

Data locked inside PDFs is productivity lost. The Poste Italiane Documents Parser gives you a friction-free bridge from raw statements to structured data, all while running offline and respecting your privacy.

Whether you are an expatriate building a personal budget, a fintech startup normalizing bank feeds, or an accountant reconciling hundreds of prepaid cards, this small open-source tool can save hours every month.

If you improve it—new document types, smarter validation, or better date parsing—the community welcomes your pull requests.


Useful links (no affiliation)