Intelligent Document Processing (IDP) Models Benchmark

We are constantly testing large language models for business automation tasks. AI model benchmark is based on digital documents datasets of various layouts and languages that represent documents processed in real projects.
We test how well AI models work at extracting data from complex documents by assessing data detection accuracy and completeness.

Testing Criteria

Recognition Accuracy

How accurately an AI model detects and extracts data from a document, like field titles and values, document layout, text and character blocks.

Processing Duration

How long it takes a model to process one document on average.

Cost

The processing cost per 1000 pages and any additional costs.

Reports

July 2025

Testing LLMs On Extracting Dimensional and Tolerance Data From Engineering Drawings: Gemini 2.5 Flash, Gemini 2.5 Pro, ChatGPT o4 mini, ChatGPT o3, Claude Opus 4, Qwen VL Plus

June 2025

Comparison Of AI Models For Table Extraction: Amazon Boto3 Textract, Azure Prebuilt Layout, GPT-4o API, Gemini 2.5 Pro, Grok 2 Vision, Pixtral Large, Google Layout Parser

March 2025

Expanded Comparison Of AI Models For Invoice Processing: Amazon Analyze Expense API, Azure AI Document Intelligence, Google Document AI, GPT-4o API, GPT-4o API with text input with 3rd party OCR, Gemini 2.0 Pro Experimental, Deepseek v3

February 2025

Best AI Services For Automatic Invoice Processing: Amazon Analyze Expense API, Azure AI Document Intelligence, Google Document AI, GPT-4o API, GPT-4o API with text input with 3rd party OCR.

Engineering Drawings Processing: Tabular Data

We have tested 7 popular AI services capable of processing tabular data on schedules from engineering drawings.

View on Full Screen

Service	Table Extraction Accuracy	Processing duration Per 1 Page, s	Cost, per 1000 pages
Azure Prebuilt Layout	81,5%	4.3 ± 0.2	$10
Amazon boto3 Textract	82,1%	2.9 ± 0.2	$15
Gemini 2.5 pro preview 05-06	94,2%	47.4 ± 15.7	$58
GPT-4o API	38,5%	16.9 ± 1.9	$19
Grok 2 vision 1212	Failed	—	—
Pixtral large latest	Failed	—	—
Google Layout Parser	Failed	—	—

See Full Report June 2025

Engineering Drawings Processing: Dimensional & Tolerance Data

Testing 6 LLMs on extraction of dimensional and tolerance data from real-world mechanical engineering drawings.

View on Full Screen

Service	Data Extraction Efficiency	Processing duration Per 1 Page, s	Cost, Per 1000 Pages
Gemini 2.5 Flash	77.34%	77.5	$30.5
Gemini 2.5 Pro	79.96%	91.4	$130.4
Gpt-o4 mini	39.59%	41.75	$24.9
Gpt-o3	20.38%	163	$239.2
Claude Opus 4	40.49%	64.8	$312
Qwen VL Plus	7.64	22	$1.59

See Full Report July 2025

Invoice Processing

We have analysed 7 most popular AI document detection models to test how well they work “out-of-the-box” on a set of digital invoices and have assessed how well they process invoices of various layouts and languages.

View on Full Screen

Service	Invoice Detection Accuracy Without Items	Invoice Detection Accuracy With Items	Processing duration Per 1 Page, s	Cost, per 1000 pages
Azure AI Document Intelligence	85,8%	85,7%	4.3 ± 0.2	$10
GPT-4o using 3d party OCR (Prebuilt Layout model by Azure AI)	90,8%	86,5%	33.0 ± 2.3	$8,8 ¹
GPT-4o only	88,3%	89,2%	16.9 ± 1.9	$8,8
Google Document AI	83,8%	68,1%	3.8 ± 0.2	$10
Amazon Analyze Expense API	91,3%	91,1%	2.9 ± 0.2	$10 ²
Gemini 2.0 Pro	90%	90,2%	8 ± 1.5	$4,5 ³
DeepSeek v3 API (Prebuilt Layout model by Azure AI)	93,3%	88,1%	69	11$
Unified Approach	~99%	~97%	~15	~30$

1 — Additional $10 per 1000 pages from using a text recognition model

2 — Additional $0.008 per page after one million

3 — $1.25, input prompts ≤ 128k tokens, $2.50, input prompts > 128k tokens, $5.00, output prompts ≤ 128k tokens, $10.00, output prompts > 128k tokens

See Full Report March 2025

Unified Intelligence: Enhance Invoice Data Extraction Up to 97%

To achieve exceptional accuracy in extracting data from invoices, we combined the power of multiple large language models (LLMs). We use advanced matching algorithms to compare the outputs of each model and select the final results using a majority-vote principle.

This ensemble approach allows us to leverage the unique strengths of each LLM, providing robust and scalable invoice data extraction for real-world business needs.

As a result, we have drastically increased the average extraction accuracy from 85% to 97%.

Looking for the best AI model for your project?

FAQs

Yes, we offer custom benchmarking services tailored to specific business requirements. Contact us to discuss your needs.

Currently we have tested Amazon Analyze Expense API, Azure AI Document Intelligence, Google Document AI, GPT-4o API, GPT-4o API - text input with 3rd party OCR. We constantly research new AI models to evaluate and text.

We use a variety of digital documents, including invoices, receipts, contracts, and forms, with different layouts and languages to ensure comprehensive testing.

We update our benchmark monthly to ensure that our evaluations reflect the latest advancements and updates in AI models.

Yes, our monthly reports include recommendations on the best AI models for specific tasks, such as invoice processing, based on our comprehensive evaluations.

Yes, we evaluate the processing duration of AI models to assess their suitability for real-time document processing tasks.

We continuously monitor updates and new versions of AI models. When a significant update is released, we retest the model to ensure our benchmark remains accurate and up-to-date.

The GPT-4o API processes documents directly, while the GPT-4o API - text input with 3rd party OCR uses a third-party Optical Character Recognition (OCR) service to convert documents to text before processing.

Our Services

Proof of Concept Services

Our Proof of Concept services provide the essential first step towards transforming your concepts into reality

Dedicated Software Team

A skilled team focused on delivering high-quality, efficient software solutions that precisely meet your project's unique needs.

MVP Development Services

Our MVP services help validate your ideas, reduce time-to-market, and increase your chances of success

Offshore Development Center

A global network of skilled developers, designers, and IT professionals who are ready to tackle your project

Urgent Development Services

We specialize in rescuing stalled projects, addressing critical issues, and accelerating development under tight deadlines

Staff Augmentation Services

Boost your team's capacity with our expert professionals, perfectly suited to your project's unique requirements, ensuring efficiency and flexibility

Contact Us

Let's Work Together!

Do you want to know the total cost of development and realization of the project? Tell us about your requirements, our specialists will contact you as soon as possible.

Service	Table Extraction Accuracy	Processing duration Per 1 Page, s	Cost, per 1000 pages
Azure Prebuilt Layout	81,5%	4.3 ± 0.2	$10
Amazon boto3 Textract	82,1%	2.9 ± 0.2	$15
Gemini 2.5 pro preview 05-06	94,2%	47.4 ± 15.7	$58
GPT-4o API	38,5%	16.9 ± 1.9	$19
Grok 2 vision 1212	Failed	—	—
Pixtral large latest	Failed	—	—
Google Layout Parser	Failed	—	—

Service	Data Extraction Efficiency	Processing duration Per 1 Page, s	Cost, Per 1000 Pages
Gemini 2.5 Flash	77.34%	77.5	$30.5
Gemini 2.5 Pro	79.96%	91.4	$130.4
Gpt-o4 mini	39.59%	41.75	$24.9
Gpt-o3	20.38%	163	$239.2
Claude Opus 4	40.49%	64.8	$312
Qwen VL Plus	7.64	22	$1.59

Service	Invoice Detection Accuracy Without Items	Invoice Detection Accuracy With Items	Processing duration Per 1 Page, s	Cost, per 1000 pages
Azure AI Document Intelligence	85,8%	85,7%	4.3 ± 0.2	$10
GPT-4o using 3d party OCR (Prebuilt Layout model by Azure AI)	90,8%	86,5%	33.0 ± 2.3	$8,8 ¹
GPT-4o only	88,3%	89,2%	16.9 ± 1.9	$8,8
Google Document AI	83,8%	68,1%	3.8 ± 0.2	$10
Amazon Analyze Expense API	91,3%	91,1%	2.9 ± 0.2	$10 ²
Gemini 2.0 Pro	90%	90,2%	8 ± 1.5	$4,5 ³
DeepSeek v3 API (Prebuilt Layout model by Azure AI)	93,3%	88,1%	69	11$
Unified Approach	~99%	~97%	~15	~30$

Intelligent Document Processing (IDP) Models Benchmark

Testing Criteria

Recognition Accuracy

Processing Duration

Cost

Reports

Engineering Drawings Processing: Tabular Data

Engineering Drawings Processing: Dimensional & Tolerance Data

Invoice Processing

Unified Intelligence: Enhance Invoice Data Extraction Up to 97%

Looking for the best AI model for your project?

FAQs

Do you provide custom benchmarks for specific business needs?

What models do you test?

What types of documents do you use for testing?

How often do you update your benchmark?

Do you provide recommendations based on your benchmark?

Do you test AI models for real-time processing?

How do you handle updates and new versions of AI models?

What is the difference between GPT-4o API and GPT-4o API - text input with 3rd party OCR?

Our Services

Proof of Concept Services

Dedicated Software Team

MVP Development Services

Offshore Development Center

Urgent Development Services

Staff Augmentation Services

Contact Us

Let's Work Together!