It Started with a Handwritten Form’s “Resurrection” In early 2025, a medical records digitization team faced a daunting challenge: converting thousands of handwritten patient forms from the 1970s into structured data. Traditional OCR solutions struggled, failing to decipher the faded ink and cursive script, with accuracy plummeting below 30%. Then they tried a model named Chandra – a tool the team lead described as “practically magic.” “Not only did it accurately read handwriting that even we found difficult,” the lead shared, “but it also correctly identified checkboxes and reconstructed the entire form into editable Markdown, perfectly preserving the original layout.” …
How a simple invoice exposed the real bottleneck in document understanding I stared at the crumpled invoice photo on my screen and sighed. This was the fifth time today I was manually fixing OCR results—jumbled text order, missing table structures, QR codes and stamps mixed with regular text. As a developer dealing with countless documents daily, this routine made me wonder: when will AI truly understand documents? Last week, while browsing GitHub as usual, I came across Baidu’s newly open-sourced PaddleOCR-VL-0.9B. Honestly, when I saw “0.9B parameters,” my first thought was: “Another lightweight model jumping on the bandwagon?” But out …