Case Studies

AI Module For A Legal Document Processing System

3 months
Documents processed monthly
Signature verification
AI Module For A Legal Document Processing System

Project Summary

A data extraction module powered by GPT-3.5 for a legal document processing system


AI Verification Service
AI App Development


1 Project manager
2 Machine learning developers

Target Audience

Law Firms


Our client is a SaaS company working on an in-cloud legal document management system to help law firms reduce paperwork and manage document flows better. The system handles over 40.000 documents every month, organizes and stores document and law firm data, as well as provides a task manager system.

Looking to gain competitive advantage on the market, the company approached us to create an AI document analyzing module with text extraction capabilities to automate document data processing.


We have developed a module for intelligent document processing: a system powered by GPT-3.5 capable of extracting relevant data from legal documents in a matter of seconds.

Text Detection

The AI module is capable of processing dozens of different document types by analyzing their layout. After the document’s type is detected, we use paddleOCR to extract the text layer and pass it onto further processing.

We use GPT-3.5 to extract relevant information, e.g. dates of legal proceedings, information on the legal process, like date and location of a forensic examination or a rehearing.

A standard GPT-3.5 module has a text length limitation, which prevents us from implementing it into legal document processing. Using an optimized GPT-3.5 32k, we’ve bypassed the limitation.

Signature Detection And Document Verification

Legal documents are often signed by judges to certify them, and a document missing a signature, or signed by a different person, is not legally binding. One of the most important aspects of the AI module is signature detection: unsigned documents need to be filtered out for further investigation.

We have utilized YOLO 5 to detect signatures and determine their author using a dataset of judges’ signatures. The client’s system uses this information to filter out legal documents that are either missing signatures or are signed by someone else, alerting the user of an unverified document.

AI Module Integration

Our client's system is run on an Azure Cloud, so our AI module is developed to be cloud-based as well to ensure seamless integration. We have configured access to OpenAI API in the cloud for continuous system operation.

The data extracted from legal documents is imported into the client's system in a JSON format for easy data integration.

Cost Optimization

Using GPT for document recognition can quickly add up: the longer the document, the more tokens it represents, the more it costs to process it. We have implemented multiple techniques to reduce costs for our client:

  • While GPT-4 is more advanced, it is less cost-effective compared to GPT-3.5. Our system analyzes documents before passing them onto further processing, assessing the document size by counting tokens, to pick the most optimal language model, thus reducing processing costs.
  • After testing multiple computer vision models, including the new GPT-4 with Vision, we have decided to implement YOLO 5 to locate and detect signatures as YOLO 5 has the best cost-to-accuracy ratio, especially compared to GPT-4.


Our client's legal document processing app is now processing documents automatically, using GPT-3.5 to extract relevant data and filter out unverified documents. The AI system was implemented seamlessly and works in the cloud to ensure stable and continuous operation.

Using the power of modern AI, our client has enhanced their product and gained an advantage over competition in the field of automatic document processing.

Let's Work Together!

Do you want to know the total cost of development and realization of the project? Tell us about your requirements, our specialists will contact you as soon as possible.