AI Translation of Technical Documentation Into Chinese

Client
Confidential
Platform
Cloud
AI Translation of Technical Documentation Into Chinese
GPT-4
Integration

An AI-based solution for translating technical documentation for a major Chinese company that manufactures power plant equipment. The translation is performed from Chinese to English while preserving the structure of the documents.

Services
AI prototype development
AI system development
Team
1 Project manager
3 AI developers
1 QA engineer
Target Audience
R&D departments
Manufacturing and engineering companies
Translation and localization service providers

Challenge

Our client is a major Chinese company specializing in the production of power plant equipment. While the manufacturing is based in China, the equipment is supplied worldwide. As a result, the client faced the challenge of translating technical documentation into English. Relying solely on human translators proved to be expensive and time-consuming, especially given the extensive product range.

The primary goal of the project was to automate the translation process from Chinese to English while preserving the document structure and processing text within diagrams and images. This solution significantly reduces the time and costs associated with translation.

Solution

This project has one key distinguishing feature. The client required a confidential solution that would not send sensitive data to third parties or use it to train language models on open datasets.

To meet this requirement, we utilized a combination of two solutions: Azure Intelligence and GPT-4. Both Azure and GPT-4 offer flexible security settings, providing corporate clients with special conditions that guarantee data will not be used to improve or train models or shared with third parties.

  • The workflow of the AI solution includes the following steps:
  • Extracting the document structure.
  • Translating texts using GPT-4.
  • Processing images to extract text.
  • Translating text within images and replacing it with translated versions.
  • Final assembly and saving in PDF format.

Text Translation

Texts in the documentation appear in various formats: paragraphs, headers, footers, and tables. GPT-4 handles all these elements.

We applied the following algorithm for working with them:

  • Extract texts while preserving the original structure.
  • Use GPT-4 to translate the text.
  • Remove the original texts and replace them with translated versions, maintaining the original formatting using python-docx.

A particular challenge was managing text volume, as Chinese and English differ significantly, and sentence lengths can vary greatly. To address this, we adapted the document's structure based on the length of the translated text.

Additionally, the client provided a glossary, which significantly improved the accuracy of the translations.

Image Processing

Since images and diagrams also contain textual data, they cannot be overlooked when creating new documentation for a different market. We used Azure Intelligence to recognize text within images.

The workflow for image processing was as follows:

  • Extract texts while preserving the original structure.
  • Extract images from the document.
  • Send images to Azure Intelligence for text recognition.
  • Send the recognized text to GPT-4 for translation.
  • Replace the text in the images with translated versions, considering the available space for text. Azure Document Intelligence not only recognizes text but also returns the coordinates of the text within the image, allowing us to place the translated text within the same bounding box.

Once the images and text were ready, we saved the new version of the documentation in PDF format.

The issue of text volume was particularly challenging for images, as we could not simply adapt the document structure. To ensure the images looked correct, we modified the text itself (including fonts, font sizes, and text positioning). This approach allowed us to achieve high quality without altering the structure of the images.

Results

Even at the prototype testing stage, we achieved impressive results. The developed prototype was presented to the client, who noted the high quality of translation and the preservation of the documentation's structure.

The cost of translation amounted to just $0.03 per page, making the solution economically efficient and freeing up a significant portion of the client's budget originally allocated for translation.

Contact Us

Let's Work Together!

Do you want to know the total cost of development and realization of the project? Tell us about your requirements, our specialists will contact you as soon as possible.

Please fill in the 'Name'
Please fill in the 'Phone'
Please fill in the 'Email'
Please fill in the 'Message'
BWT Chatbot