Customer Story

Achieving 93% structured extraction accuracy for complex invoices

93%

extraction accuracy.

1,100+

formats, incl. handwritten.

3x

faster manual verification.

Achieving 93% structured extraction accuracy for complex invoices

Background

The client was developing a proprietary SaaS platform to deliver enhanced financial services to their SME customers. A key success factor for the platform was the ability to accurately extract structured data from invoices. Since their clients’ invoices came in a wide variety of formats, from complex digital layouts to handwritten documents from cash-and-carry businesses, the extraction system needed to be highly robust and adaptable.

Business Challenges

Without a reliable automated extraction system, a substantial portion of invoices required manual annotation, resulting in significant inefficiencies and processing delays. This manual intervention also increased operational costs, as processing large volumes of invoices became both time-consuming and expensive.

Technical Challenges

The firm worked with thousands of suppliers, each using different invoice templates and formats. The presence of handwritten text and low-quality scanned copies further compounded the difficulty. Additionally, many invoices spanned multiple pages, adding another layer of complexity to the extraction process.

Solution Principles

To address these challenges, the solution was designed around four key principles:

Approach

We began by assembling a diverse dataset that covered both system-generated and handwritten invoices from thousands of suppliers. With this dataset, we systematically evaluated existing solutions, from traditional OCR models to vision-language models (VLMs), to establish a performance baseline.

Problems with Existing Tools

Although vision-language models offered the flexibility needed to handle diverse invoice formats, their accuracy remained below production standards, even after extensive benchmarking and few-shot tuning.

Our Solution

Our strategy focused on developing a bespoke, multi-stage AI pipeline that combined the strengths of OCR systems with fine-tuned vision-language models.

Step 1: Selecting the Right Base Model

We first evaluated several open-source vision-language models (VLMs) with fewer than 32B parameters to identify the optimal base for fine-tuning. The goal was to balance performance, adaptability, and cost efficiency.

Step 2: Fine-Tuning the Vision-Language Model

We then fine-tuned the top-performing VLMs on the client’s specific dataset. This improved the model’s ability to understand invoice structure and map relevant fields. However, text recognition errors from the visual input persisted, particularly in noisy or handwritten documents.

The Breakthrough: A Two-Stage OCR + Fine-Tuned SLM Pipeline

Recognizing the limitations of a single-model approach, we engineered a highly efficient two-stage pipeline that became the core of the client’s new feature.

Stage 1: High-Fidelity Text Extraction with OCR

We fine-tuned an OCR model using thousands of the client’s diverse documents, with particular emphasis on handwritten invoices. This model became exceptionally skilled at converting even the most complex and messy documents into clean text, forming a solid foundation for the next stage.

Stage 2: Structured Extraction with a Fine-Tuned Small Language Model (SLM)

The text output from the OCR was then passed into a fine-tuned 7-billion-parameter Small Language Model (SLM). Unlike a conventional text parser, this model was trained to understand context and structure, allowing it to accurately identify and extract only the required invoice fields.

By combining the OCR’s precision with the SLM’s contextual reasoning, we achieved high extraction accuracy while keeping inference costs low. By self-hosting this pipeline, the client completely eliminated the variable, per-document fees associated with third-party APIs. The cost shifted from an unpredictable external expense to a manageable and predictable internal cost of computation, ensuring the unit economics worked at scale.

Results Graph

Results: A Reliable AI Invoice Extractor

Our custom-built AI engine became the foundation of the client’s invoice processing system, delivering transformative business outcomes and a distinct competitive edge.

Performance Highlights

If your organization faces similar challenges with complex document extraction, high operational costs, or the limitations of generic AI models, reach out to us today. Our team specializes in designing and building cost-efficient AI pipelines that deliver measurable results and a lasting competitive edge.

Ready to ship reliable, production-ready AI?

Let's get on a call to discuss how we can help you achieve your AI vision.