Intelligent Document Processing (IDP) Models Benchmark

We are constantly testing large language models for business automation tasks. AI model benchmark is based on digital documents datasets of various layouts and languages that represent documents processed in real projects.
We test how well AI models work at extracting data from complex documents by assessing data detection accuracy and completeness.

Testing Criteria

01

Recognition Accuracy

How accurately an AI model detects and extracts data from a document, like field titles and values, document layout, text and character blocks.

02

Processing Duration

How long it takes a model to process one document on average.

03

Cost

The processing cost per 1000 pages and any additional costs.

Reports

July 2025
Testing LLMs On Extracting Dimensional and Tolerance Data From Engineering Drawings: Gemini 2.5 Flash, Gemini 2.5 Pro, ChatGPT o4 mini, ChatGPT o3, Claude Opus 4, Qwen VL Plus
June 2025
Comparison Of AI Models For Table Extraction: Amazon Boto3 Textract, Azure Prebuilt Layout, GPT-4o API, Gemini 2.5 Pro, Grok 2 Vision, Pixtral Large, Google Layout Parser
March 2025
Expanded Comparison Of AI Models For Invoice Processing: Amazon Analyze Expense API, Azure AI Document Intelligence, Google Document AI, GPT-4o API, GPT-4o API with text input with 3rd party OCR, Gemini 2.0 Pro Experimental, Deepseek v3
February 2025
Best AI Services For Automatic Invoice Processing: Amazon Analyze Expense API, Azure AI Document Intelligence, Google Document AI, GPT-4o API, GPT-4o API with text input with 3rd party OCR.

Engineering Drawings Processing: Tabular Data

We have tested 7 popular AI services capable of processing tabular data on schedules from engineering drawings.

Service

Table Extraction Accuracy

Processing duration Per 1 Page, s

Cost, per 1000 pages

Azure Prebuilt Layout

81,5%

4.3 ± 0.2

$10

Amazon boto3 Textract

82,1%

2.9 ± 0.2

$15

Gemini 2.5 pro preview 05-06

94,2%

47.4 ± 15.7

$58

GPT-4o API

38,5%

16.9 ± 1.9

$19

Grok 2 vision 1212 Failed
Pixtral large latest Failed
Google Layout Parser Failed

Engineering Drawings Processing: Dimensional & Tolerance Data

Testing 6 LLMs on extraction of dimensional and tolerance data from real-world mechanical engineering drawings.

Service

Data Extraction Efficiency

Processing duration Per 1 Page, s

Cost, Per 1000 Pages

Gemini 2.5 Flash

77.34%

77.5

$30.5

Gemini 2.5 Pro

79.96%

91.4

$130.4

Gpt-o4 mini

39.59%

41.75

$24.9

Gpt-o3

20.38%

163

$239.2

Claude Opus 4

40.49%

64.8

$312

Qwen VL Plus

7.64

22

$1.59

Invoice Processing

We have analysed 7 most popular AI document detection models to test how well they work “out-of-the-box” on a set of digital invoices and have assessed how well they process invoices of various layouts and languages.

Service

Invoice Detection Accuracy Without Items

Invoice Detection Accuracy With Items

Processing duration Per 1 Page, s

Cost, per 1000 pages

Azure AI Document Intelligence

85,8%

85,7%

4.3 ± 0.2

$10

GPT-4o using 3d party OCR (Prebuilt Layout model by Azure AI)

90,8%

86,5%

33.0 ± 2.3

$8,8 1

GPT-4o only

88,3%

89,2%

16.9 ± 1.9

$8,8

Google Document AI

83,8%

68,1%

3.8 ± 0.2

$10

Amazon Analyze Expense API

91,3%

91,1%

2.9 ± 0.2

$10 2

Gemini 2.0 Pro 90% 90,2% 8 ± 1.5 $4,5 3
DeepSeek v3 API (Prebuilt Layout model by Azure AI) 93,3% 88,1% 69 11$
Unified Approach ~99% ~97% ~15 ~30$
1 — Additional $10 per 1000 pages from using a text recognition model
2 — Additional $0.008 per page after one million
3 — $1.25, input prompts ≤ 128k tokens, $2.50, input prompts > 128k tokens, $5.00, output prompts ≤ 128k tokens, $10.00, output prompts > 128k tokens

Unified Intelligence: Enhance Invoice Data Extraction Up to 97%

To achieve exceptional accuracy in extracting data from invoices, we combined the power of multiple large language models (LLMs). We use advanced matching algorithms to compare the outputs of each model and select the final results using a majority-vote principle.

This ensemble approach allows us to leverage the unique strengths of each LLM, providing robust and scalable invoice data extraction for real-world business needs.

As a result, we have drastically increased the average extraction accuracy from 85% to 97%.

Looking for the best AI model for your project?

Looking for the best AI model for your project?

Contact us to get a consultation and model recommendations

Contact us

FAQs

Yes, we offer custom benchmarking services tailored to specific business requirements. Contact us to discuss your needs.

Currently we have tested Amazon Analyze Expense API, Azure AI Document Intelligence, Google Document AI, GPT-4o API, GPT-4o API - text input with 3rd party OCR. We constantly research new AI models to evaluate and text.

We use a variety of digital documents, including invoices, receipts, contracts, and forms, with different layouts and languages to ensure comprehensive testing.

We update our benchmark monthly to ensure that our evaluations reflect the latest advancements and updates in AI models.

Yes, our monthly reports include recommendations on the best AI models for specific tasks, such as invoice processing, based on our comprehensive evaluations.

Yes, we evaluate the processing duration of AI models to assess their suitability for real-time document processing tasks.

We continuously monitor updates and new versions of AI models. When a significant update is released, we retest the model to ensure our benchmark remains accurate and up-to-date.

The GPT-4o API processes documents directly, while the GPT-4o API - text input with 3rd party OCR uses a third-party Optical Character Recognition (OCR) service to convert documents to text before processing.

Our Services

Proof of Concept Services

Proof of Concept Services

Our Proof of Concept services provide the essential first step towards transforming your concepts into reality

Dedicated Software Team

Dedicated Software Team

A skilled team focused on delivering high-quality, efficient software solutions that precisely meet your project's unique needs.

MVP Development Services

MVP Development Services

Our MVP services help validate your ideas, reduce time-to-market, and increase your chances of success

Offshore Development Center

Offshore Development Center

A global network of skilled developers, designers, and IT professionals who are ready to tackle your project

Urgent Development Services

Urgent Development Services

We specialize in rescuing stalled projects, addressing critical issues, and accelerating development under tight deadlines

Staff Augmentation Services

Staff Augmentation Services

Boost your team's capacity with our expert professionals, perfectly suited to your project's unique requirements, ensuring efficiency and flexibility

Contact Us

Let's Work Together!

Do you want to know the total cost of development and realization of the project? Tell us about your requirements, our specialists will contact you as soon as possible.

Please fill in the 'Name'
Please fill in the 'Phone'
Please fill in the 'Email'
Please fill in the 'Message'
BWT Chatbot