Stop Copy-Pasting: Turn Italian Post Office PDFs into Clean JSON or CSV in One Command
If you study, work, or simply live in Italy, you know the monthly ritual: log in to Poste Italiane, download a PDF statement, and then spend the better part of an afternoon copying numbers into a spreadsheet.
This post shows you how to replace that ritual with a single, repeatable command. We will use the open-source utility Poste Italiane Documents Parser to extract, validate, and export every balance, transaction, and personal detail into JSON or CSV—ready for Excel, Pandas, or any other tool you already use.
Table of contents
-
Why structured data beats PDF tables -
What the parser does (and what it does not) -
Supported document types in plain English -
One-command installation on macOS, Windows, or Linux -
From your first file to a whole folder: hands-on examples -
Using the parser as a Python library in your own scripts -
Anatomy of the output: every field explained -
Writing reliable tests without leaking personal data -
Common pitfalls and how to avoid them -
Ideas for what to build next
1. Why structured data beats PDF tables
Pain point | Manual workflow | After the parser |
---|---|---|
Time | 30–60 min per statement | 2–3 s |
Typos | Decimal errors, date mix-ups | Automatic validation |
Re-use | Locked in PDF | Ready for Excel, Power BI, Google Sheets |
Automation | Impossible | Cron job or GitHub Action |
The Italian Post Office does not expose an official API; the PDF is your only data source. The parser fills that gap without storing or transmitting your data anywhere.
2. What the parser does (and what it does not)
What it does
-
Detects the document type automatically (current account, prepaid card, etc.) -
Validates that opening balance + credits – debits = closing balance -
Converts one file or an entire folder -
Exports to JSON (default) or CSV with a flag -
Runs offline—no cloud calls, no registration
What it does not
-
Handle password-protected PDFs -
Categorize merchants (e.g., groceries vs. utilities) -
Support PosteMobile phone bills (yet)
3. Supported document types in plain English
Italian name | What it is | English equivalent |
---|---|---|
Estratto Conto BancoPosta | Monthly bank statement | Checking account statement |
Rendiconto Postepay Evolution | Monthly prepaid card summary | Prepaid card statement |
Lista Movimenti Postepay Evolution | Detailed transaction list | Transaction history |
If you download statements from the Poste Italiane online archive, the filenames look like:
EstrattoConto_0001234567_20250731.pdf
Rendiconto_PP_1234567890_202507.pdf
The parser recognizes these patterns internally, so you never have to tell it “this is a Postepay statement.”
4. One-command installation on macOS, Windows, or Linux
Prerequisites
-
Python 3.8 or higher -
pip
(comes with most Python installers)
Step-by-step:
# 1. Clone the repository
git clone https://github.com/genbs/poste-italiane-parser.git
cd poste-italiane-parser
# 2. Install the dependencies
pip install -r requirements.txt
That is it.
The requirements.txt
file lists only two main libraries—pdfplumber
to read tables and pydantic
to validate the extracted data—so installation usually finishes in under a minute even on modest hardware.
5. From your first file to a whole folder: hands-on examples
5.1 Save your PDFs
Download statements from the Poste Italiane portal and save them in any folder, e.g.:
~/Documents/poste_pdfs/
├── statement_july2025.pdf
├── postepay_june2025.pdf
└── misc/
└── older_statement.pdf
5.2 Convert a single file to JSON (default)
python main.py --path "~/Documents/poste_pdfs/statement_july2025.pdf"
A file named statement_july2025.json
appears next to the original PDF.
Open it in any editor; you will see a clear hierarchy without nested arrays of arrays.
5.3 Convert an entire folder to CSV
python main.py "~/Documents/poste_pdfs" \
--format csv \
--output ~/Documents/poste_results \
--verbose
-
--output
can be a folder; the script writes one CSV per source PDF. -
--verbose
prints lines such as:
INFO: Detected ESTRATTO_CONTO
INFO: Balance check OK
6. Using the parser as a Python library in your own scripts
If you already have a Flask API, Jupyter notebook, or scheduled job, import the parser directly:
from poste_italiane_parser import PosteItalianeParser
file_path = "rendiconto_postepay.pdf"
try:
data = PosteItalianeParser(file_path)
print("Document type:", data['document_type'])
print("Holder:", data['holder'])
print("Closing balance:", data['final_balance'])
except ValueError as e:
print("Validation failed:", e)
except FileNotFoundError:
print("File not found:", file_path)
Returned data
is a plain Python dictionary. Feed it straight into pandas:
import pandas as pd
df = pd.DataFrame(data['transactions'])
df['value_date'] = pd.to_datetime(df['value_date'])
monthly_net = df.groupby(df['value_date'].dt.to_period('M'))['value'].sum()
print(monthly_net)
7. Anatomy of the output: every field explained
Below is the complete JSON schema for an Estratto Conto BancoPosta.
The same keys appear for Postepay documents; missing fields are set to null
.
{
"generated_at": "2025-07-24 10:30:00",
"document_type": "ESTRATTO_CONTO",
"currency": "EUR",
"initial_balance": 1234.56,
"final_balance": 987.65,
"iban": "IT60X076011240000001234567890",
"holder": "Mario Rossi",
"card_number": null,
"account_number": "000001234567",
"period": {
"start_date": "2025-07-01",
"end_date": "2025-07-31"
},
"customer": {
"name": "Mario Rossi",
"street": "Via Roma 1",
"city": "Milano"
},
"transactions": [
{
"accounting_date": "2025-07-05 00:00:00",
"value_date": "2025-07-05 00:00:00",
"description": "POS 1234567890123456 AMAZON EU",
"debits": 49.99,
"credits": 0.0,
"value": -49.99
},
{
"accounting_date": "2025-07-10 00:00:00",
"value_date": "2025-07-10 00:00:00",
"description": "ACCREDITO STIPENDIO",
"debits": 0.0,
"credits": 1500.0,
"value": 1500.0
}
]
}
Field-by-field glossary
Key | Meaning | Always present? |
---|---|---|
generated_at |
Timestamp when the parser ran | Yes |
document_type |
ENUM: ESTRATTO_CONTO , LISTA_MOVIMENTI , or RENDICONTO |
Yes |
currency |
Three-letter code, always EUR for Italian accounts |
Yes |
initial_balance |
Opening balance for the period | Yes (null for some Postepay) |
final_balance |
Closing balance | Yes |
iban |
International Bank Account Number | Only for BancoPosta |
holder |
Legal account holder | Yes |
card_number |
Last 4 digits of Postepay card | Only for Postepay |
account_number |
Internal Poste Italiane account ID | Yes |
period |
Object with start_date and end_date |
Yes |
customer |
Name and address as registered | Yes |
transactions |
Array of transaction objects | Yes |
Each transaction object contains:
Sub-key | Type | Sign convention |
---|---|---|
accounting_date |
datetime | — |
value_date |
datetime | — |
description |
string | Raw text from PDF |
debits |
float | Positive number, money leaving the account |
credits |
float | Positive number, money entering the account |
value |
float | Signed total: credits - debits |
8. Writing reliable tests without leaking personal data
The repository does not include real PDFs for privacy reasons. Instead, tests rely on “golden” JSON files that describe the expected output.
8.1 Create a test file
Inside the tests/
folder, create a file named my_case.test.json
:
{
"path": "tests/sample_statement.pdf",
"currency": "EUR",
"holder": "Giuseppe Verdi",
"initial_balance": 100.00,
"final_balance": 200.00,
"iban": "IT60X076011240000001234567890",
"period_start_date": "2025-07-01",
"period_end_date": "2025-07-31",
"transactions": [
{
"accounting_date": "2025-07-03 00:00:00",
"value_date": "2025-07-03 00:00:00",
"description": "BOLLETTINO 12345",
"credits": 100.00,
"debits": 0.00
}
]
}
-
You may include all or only some transactions. -
Extra keys in actual output are ignored, so you can keep the test minimal.
8.2 Run the test suite
python -m unittest tests/test_PosteItalianeParser.py -v
If any field mismatches, the test runner prints the exact difference:
FAIL: test_my_case
AssertionError: final_balance expected 200.00, got 199.98
Update the PDF or the JSON until the test passes; your CI pipeline will now guard against regressions.
9. Common pitfalls and how to avoid them
Symptom | Cause | Fix |
---|---|---|
ValueError: Unknown document type |
PDF is scanned image, not text | Re-download from Poste Italiane and choose “Testo” (Text) instead of “Immagine” (Image) |
Balance check failed |
Currency rounding or duplicate lines | Open an issue with the PDF (first 3 pages) attached |
File not found on Windows | Spaces in path | Wrap the path in double quotes: --path "C:\My Files\poste" |
CSV shows strange characters | Default encoding on Windows | Open the CSV in Excel → Data → From Text/CSV → choose UTF-8 |
10. Ideas for what to build next
-
Monthly cron job
Combinewget
orselenium
to download PDFs, then run the parser and upload results to Google Drive. -
Personal finance dashboard
Feed the JSON into Grafana or Google Data Studio for live spending charts. -
Expense reimbursement bot
Employees drop Postepay PDFs into a shared folder; the bot returns categorized CSV for the accounting team. -
Banking bridge
Use the library inside a Django REST API so other apps can query Italian account data without storing credentials.
Closing thoughts
Data locked inside PDFs is productivity lost. The Poste Italiane Documents Parser gives you a friction-free bridge from raw statements to structured data, all while running offline and respecting your privacy.
Whether you are an expatriate building a personal budget, a fintech startup normalizing bank feeds, or an accountant reconciling hundreds of prepaid cards, this small open-source tool can save hours every month.
If you improve it—new document types, smarter validation, or better date parsing—the community welcomes your pull requests.
Useful links (no affiliation)
-
Source code: github.com/genbs/poste-italiane-parser -
Poste Italiane document portal: comunicazionionline.poste.it