Stop Copy-Pasting: Turn Italian Post Office PDFs into Clean JSON or CSV in One Command

Italian post office envelopes next to a laptop

If you study, work, or simply live in Italy, you know the monthly ritual: log in to Poste Italiane, download a PDF statement, and then spend the better part of an afternoon copying numbers into a spreadsheet.
This post shows you how to replace that ritual with a single, repeatable command. We will use the open-source utility Poste Italiane Documents Parser to extract, validate, and export every balance, transaction, and personal detail into JSON or CSV—ready for Excel, Pandas, or any other tool you already use.

Why structured data beats PDF tables
What the parser does (and what it does not)
Supported document types in plain English
One-command installation on macOS, Windows, or Linux
From your first file to a whole folder: hands-on examples
Using the parser as a Python library in your own scripts
Anatomy of the output: every field explained
Writing reliable tests without leaking personal data
Common pitfalls and how to avoid them
Ideas for what to build next

1. Why structured data beats PDF tables

Pain point	Manual workflow	After the parser
Time	30–60 min per statement	2–3 s
Typos	Decimal errors, date mix-ups	Automatic validation
Re-use	Locked in PDF	Ready for Excel, Power BI, Google Sheets
Automation	Impossible	Cron job or GitHub Action

The Italian Post Office does not expose an official API; the PDF is your only data source. The parser fills that gap without storing or transmitting your data anywhere.

2. What the parser does (and what it does not)

What it does

Detects the document type automatically (current account, prepaid card, etc.)
Validates that opening balance + credits – debits = closing balance
Converts one file or an entire folder
Exports to JSON (default) or CSV with a flag
Runs offline—no cloud calls, no registration

What it does not

Handle password-protected PDFs
Categorize merchants (e.g., groceries vs. utilities)
Support PosteMobile phone bills (yet)

3. Supported document types in plain English

Italian name	What it is	English equivalent
Estratto Conto BancoPosta	Monthly bank statement	Checking account statement
Rendiconto Postepay Evolution	Monthly prepaid card summary	Prepaid card statement
Lista Movimenti Postepay Evolution	Detailed transaction list	Transaction history

If you download statements from the Poste Italiane online archive, the filenames look like:

EstrattoConto_0001234567_20250731.pdf
Rendiconto_PP_1234567890_202507.pdf

The parser recognizes these patterns internally, so you never have to tell it “this is a Postepay statement.”

Three PDF icons next to their JSON equivalents

4. One-command installation on macOS, Windows, or Linux

Prerequisites

Python 3.8 or higher
pip (comes with most Python installers)

Step-by-step:

# 1. Clone the repository
git clone https://github.com/genbs/poste-italiane-parser.git
cd poste-italiane-parser

# 2. Install the dependencies
pip install -r requirements.txt

That is it.
The requirements.txt file lists only two main libraries—pdfplumber to read tables and pydantic to validate the extracted data—so installation usually finishes in under a minute even on modest hardware.

5. From your first file to a whole folder: hands-on examples

5.1 Save your PDFs

Download statements from the Poste Italiane portal and save them in any folder, e.g.:

~/Documents/poste_pdfs/
├── statement_july2025.pdf
├── postepay_june2025.pdf
└── misc/
    └── older_statement.pdf

5.2 Convert a single file to JSON (default)

python main.py --path "~/Documents/poste_pdfs/statement_july2025.pdf"

A file named statement_july2025.json appears next to the original PDF.
Open it in any editor; you will see a clear hierarchy without nested arrays of arrays.

5.3 Convert an entire folder to CSV

python main.py "~/Documents/poste_pdfs" \
               --format csv \
               --output ~/Documents/poste_results \
               --verbose

--output can be a folder; the script writes one CSV per source PDF.
--verbose prints lines such as:

INFO: Detected ESTRATTO_CONTO
INFO: Balance check OK

6. Using the parser as a Python library in your own scripts

If you already have a Flask API, Jupyter notebook, or scheduled job, import the parser directly:

from poste_italiane_parser import PosteItalianeParser

file_path = "rendiconto_postepay.pdf"

try:
    data = PosteItalianeParser(file_path)
    print("Document type:", data['document_type'])
    print("Holder:", data['holder'])
    print("Closing balance:", data['final_balance'])
except ValueError as e:
    print("Validation failed:", e)
except FileNotFoundError:
    print("File not found:", file_path)

Returned data is a plain Python dictionary. Feed it straight into pandas:

import pandas as pd

df = pd.DataFrame(data['transactions'])
df['value_date'] = pd.to_datetime(df['value_date'])
monthly_net = df.groupby(df['value_date'].dt.to_period('M'))['value'].sum()
print(monthly_net)

7. Anatomy of the output: every field explained

Below is the complete JSON schema for an Estratto Conto BancoPosta.
The same keys appear for Postepay documents; missing fields are set to null.

{
  "generated_at": "2025-07-24 10:30:00",
  "document_type": "ESTRATTO_CONTO",
  "currency": "EUR",
  "initial_balance": 1234.56,
  "final_balance": 987.65,
  "iban": "IT60X076011240000001234567890",
  "holder": "Mario Rossi",
  "card_number": null,
  "account_number": "000001234567",
  "period": {
    "start_date": "2025-07-01",
    "end_date": "2025-07-31"
  },
  "customer": {
    "name": "Mario Rossi",
    "street": "Via Roma 1",
    "city": "Milano"
  },
  "transactions": [
    {
      "accounting_date": "2025-07-05 00:00:00",
      "value_date": "2025-07-05 00:00:00",
      "description": "POS 1234567890123456 AMAZON EU",
      "debits": 49.99,
      "credits": 0.0,
      "value": -49.99
    },
    {
      "accounting_date": "2025-07-10 00:00:00",
      "value_date": "2025-07-10 00:00:00",
      "description": "ACCREDITO STIPENDIO",
      "debits": 0.0,
      "credits": 1500.0,
      "value": 1500.0
    }
  ]
}

Field-by-field glossary

Key	Meaning	Always present?
`generated_at`	Timestamp when the parser ran	Yes
`document_type`	ENUM: `ESTRATTO_CONTO`, `LISTA_MOVIMENTI`, or `RENDICONTO`	Yes
`currency`	Three-letter code, always `EUR` for Italian accounts	Yes
`initial_balance`	Opening balance for the period	Yes (null for some Postepay)
`final_balance`	Closing balance	Yes
`iban`	International Bank Account Number	Only for BancoPosta
`holder`	Legal account holder	Yes
`card_number`	Last 4 digits of Postepay card	Only for Postepay
`account_number`	Internal Poste Italiane account ID	Yes
`period`	Object with `start_date` and `end_date`	Yes
`customer`	Name and address as registered	Yes
`transactions`	Array of transaction objects	Yes

Each transaction object contains:

Sub-key	Type	Sign convention
`accounting_date`	datetime	—
`value_date`	datetime	—
`description`	string	Raw text from PDF
`debits`	float	Positive number, money leaving the account
`credits`	float	Positive number, money entering the account
`value`	float	Signed total: `credits - debits`

8. Writing reliable tests without leaking personal data

The repository does not include real PDFs for privacy reasons. Instead, tests rely on “golden” JSON files that describe the expected output.

8.1 Create a test file

Inside the tests/ folder, create a file named my_case.test.json:

{
  "path": "tests/sample_statement.pdf",
  "currency": "EUR",
  "holder": "Giuseppe Verdi",
  "initial_balance": 100.00,
  "final_balance": 200.00,
  "iban": "IT60X076011240000001234567890",
  "period_start_date": "2025-07-01",
  "period_end_date": "2025-07-31",
  "transactions": [
    {
      "accounting_date": "2025-07-03 00:00:00",
      "value_date": "2025-07-03 00:00:00",
      "description": "BOLLETTINO 12345",
      "credits": 100.00,
      "debits": 0.00
    }
  ]
}

You may include all or only some transactions.
Extra keys in actual output are ignored, so you can keep the test minimal.

8.2 Run the test suite

python -m unittest tests/test_PosteItalianeParser.py -v

If any field mismatches, the test runner prints the exact difference:

FAIL: test_my_case
AssertionError: final_balance expected 200.00, got 199.98

Update the PDF or the JSON until the test passes; your CI pipeline will now guard against regressions.

9. Common pitfalls and how to avoid them

Symptom	Cause	Fix
`ValueError: Unknown document type`	PDF is scanned image, not text	Re-download from Poste Italiane and choose “Testo” (Text) instead of “Immagine” (Image)
`Balance check failed`	Currency rounding or duplicate lines	Open an issue with the PDF (first 3 pages) attached
File not found on Windows	Spaces in path	Wrap the path in double quotes: `--path "C:\My Files\poste"`
CSV shows strange characters	Default encoding on Windows	Open the CSV in Excel → Data → From Text/CSV → choose UTF-8

10. Ideas for what to build next

Monthly cron job
Combine wget or selenium to download PDFs, then run the parser and upload results to Google Drive.
Personal finance dashboard
Feed the JSON into Grafana or Google Data Studio for live spending charts.
Expense reimbursement bot
Employees drop Postepay PDFs into a shared folder; the bot returns categorized CSV for the accounting team.
Banking bridge
Use the library inside a Django REST API so other apps can query Italian account data without storing credentials.

Closing thoughts

Data locked inside PDFs is productivity lost. The Poste Italiane Documents Parser gives you a friction-free bridge from raw statements to structured data, all while running offline and respecting your privacy.

Whether you are an expatriate building a personal budget, a fintech startup normalizing bank feeds, or an accountant reconciling hundreds of prepaid cards, this small open-source tool can save hours every month.

If you improve it—new document types, smarter validation, or better date parsing—the community welcomes your pull requests.

Useful links (no affiliation)

Source code: github.com/genbs/poste-italiane-parser
Poste Italiane document portal: comunicazionionline.poste.it

Italian Post Office PDF Parser: Convert Bank Statements to JSON/CSV Automatically

Stop Copy-Pasting: Turn Italian Post Office PDFs into Clean JSON or CSV in One Command

Table of contents

1. Why structured data beats PDF tables

2. What the parser does (and what it does not)

3. Supported document types in plain English

4. One-command installation on macOS, Windows, or Linux

5. From your first file to a whole folder: hands-on examples

5.1 Save your PDFs

5.2 Convert a single file to JSON (default)

5.3 Convert an entire folder to CSV

6. Using the parser as a Python library in your own scripts

7. Anatomy of the output: every field explained

Field-by-field glossary

8. Writing reliable tests without leaking personal data

8.1 Create a test file

8.2 Run the test suite

9. Common pitfalls and how to avoid them

10. Ideas for what to build next

Closing thoughts

Useful links (no affiliation)

Italian Post Office PDF Parser: Convert Bank Statements to JSON/CSV Automatically

Stop Copy-Pasting: Turn Italian Post Office PDFs into Clean JSON or CSV in One Command

Table of contents

1. Why structured data beats PDF tables

2. What the parser does (and what it does not)

3. Supported document types in plain English

4. One-command installation on macOS, Windows, or Linux

5. From your first file to a whole folder: hands-on examples

5.1 Save your PDFs

5.2 Convert a single file to JSON (default)

5.3 Convert an entire folder to CSV

6. Using the parser as a Python library in your own scripts

7. Anatomy of the output: every field explained

Field-by-field glossary

8. Writing reliable tests without leaking personal data

8.1 Create a test file

8.2 Run the test suite

9. Common pitfalls and how to avoid them

10. Ideas for what to build next

Closing thoughts

Useful links (no affiliation)

Related Posts