PDF files recognition system for an architectural bureau | Businessware Technologies

Case Studies


PDF files recognition system for an architectural bureau

PDF files recognition system for an architectural bureau
Recognition accuracy
Increase of document analysis speed

Our client evaluates the construction of buildings and prepares bills of quantities. Buildings can be very different: a private house, an apartment building, or an office building.

Case Study


To accurately prepare a bill of quantities and provide a price estimation, the following information needs to be extracted from a floor plan:

  • Building type - office building, shopping mall, apartment building
  • Floor plan type - electrical, plumbing, etc.
  • Scale ratio
  • Find special symbols like doors, windows, bathroom fixtures

The project turned out to be a complex one as the documentation is not standardised:

  • All PDF documents are formatted differently
  • Some floor plants are drawn by hand
  • A large variety of fonts and special symbols used

Ready-made solutions for optical character recognition (OCR) could not handle the task with enough accuracy or could not handle it at all due to the special characters used in the floor plans.

Floor Plan Recognition

The first step in analysing any PDF file with a floor plan is to detect if there is just one or multiple floor plans present on one page. This step allows to break the PDF file into sections which can be accurately analysed.

Detecting the type of building and type of the floor plan comes down to locating special character sequences. As some PDF files are not searchable, we had to use an OCR approach to detect the symbols. We had to take into consideration the different fonts and font sizes used in the floor plans when looking for the symbols. As a result, the system quickly detects both the building type and floor plan type regardless of the position of the markup, its font and size.

To determine the scale of the floor plan we have used OCR Space and IText which were augmented with a system of balances to choose the most optimal OCR Space settings and compare results with those of IText. Coupled with Tesseract, we have managed to obtain up to 98% accuracy.

Object Recognition

Another important objective we had to achieve is detect various objects present in the floor plans, like doors, power outlets, etc. These objects are marked by special symbols that are fairly difficult to differentiate from the rest of the floor plan due to their simple geometric shape. Moreover, the symbols may be different depending on who compiles the floor plan.

As OpenCV algorithms are not well suited for analyzing simple black and white geometric shapes, we have incorporated deep learning to increase the accuracy and reject false positives.

To solve the problem of different symbols being used to mark the same objects, the system requires the user to highlight the symbols that need to be detected and counted.

PDF table to an Excel table

Floor plans sometimes come with a bill of quantities that needs to be evaluated in terms of cost. The PDF tables are not an ideal way to handle large amounts of data since they cannot be edited and the data cannot be sorted or filtered.

There is a number of readymade tools and solutions that can turn a PDF table into an Excel one, though they work poorly with large complex tables that include merged cells and span across multiple PDF pages:

Readymade solutions do not handle merged cells well and often split them incorrectly Often when PDF tables span across multiple pages, their columns don’t line up which causes existing tools to process the data incorrectly If the text goes outside of its cell, readymade solutions split the text into multiple cells

We have developed a subsystem which scans the PDF tables and turns them into Excel tables without changing the original structure of the table and keeping the data integrity.


  1. A user uploads a PDF file with a floor plan
  2. The system sections off the floor plan and highlights the building type, the floor plan type and scale
  3. The user corrects any mistakes, highlights the objects that need to be counted and clicks “Process”
  4. The system then recognises building walls, counts the objects of interest and provides an Excel file with a bill of quantities and a price estimation.
  5. If a bill of quantities is already present in the PDF, the table is converted into an Excel table for easy processing.


The resulting system is a full-fledged tool for working with complex floor plans and accompanying tables. It reduces manual labor and greatly speeds up price estimation. The system is highly flexible and can be adjusted to analyse any PDF document and extract relevant information.

Let's Work Together!

Do you want to know the total cost of development and realization of the project? Tell us about your requirements, our specialists will contact you as soon as possible.