With early experiments in computer vision starting in the 1950s, the field has gone a long way and the applications of computer vision have grown exponentially.
Computer vision is not only one of the hottest research fields in computer science, but also a significant part of our everyday lives. The tech we use everyday often has computer vision algorithms running in the background.
In general, computer vision is an area of artificial intelligence whose main goal is to teach computers how to ‘see’, or to analyse and interpret various visual data and then make decisions based on it. The amount of visual data that we generate daily - image, video, and even sound - is one of the driving factors behind the growth of computer vision, as this data is then used to train computer vision algorithms and make them better.
With the advancements in hardware, the computing power required to analyze the visual data is now more accessible and affordable, leading to an increase of accuracy of object identification from 50% to 99% in less than a decade.
While there are many uses of computer vision algorithms they may seem out of reach, too complex and expensive, painting the whole field as ‘too expensive’, ‘too ambitious’, and generally ‘too much’ for a regular business owner, we are sure computer vision is a lot more accessible as it may seem at first. Having worked on a number of computer vision projects, we have seen first hand how powerful it can be.
Recognition of text characters is one of the most trivial yet widely used areas of computer vision. OCR is often used to detect text and convert it into an editable format, like turning a scanned document into a .docx document or turning PDF tables into Excel ones for editing. OCR is a technology that has found applications throughout a large spectrum of industries where immediate saving of labor - otherwise lost in retyping of typewritten or handwritten text - is realised.
One of the first applications that turned PDF text into Word and Excel documents is ABBYY FineReader. Nowadays, OCR is implemented into many different applications, both desktop and mobile, and is used to analyse not only typewritten text, but also handwritten writing.
The technology has grown so reliable that establishments that need a high level of data security, like banks and hospitals, use text recognition software for analysing important documents, like handwritten cheques, invoices, legal paperwork, hospitals records, etc.
We have implemented various text recognition tools and approaches into our client's systems for faster document processing and editing. Usually, these systems utilise ready-made solutions that only require proper implementation and tuning, making them fast and rather inexpensive to develop.
Certain businesses, like architectural bureaus, often work with specialised symbols that are not easily recognised by standard text recognition tools. These tasks - recognition of complex objects - is where computer vision shines best. Computer vision algorithms can be trained to accurately and reliably detect whatever symbol or character you need, no matter its position or size on the page.
We have had a chance to utilize the full potential of computer vision when working with a blueprint recognition system which reduces the time needed to compile a bill of quantities - a full list of objects present in a blueprint, like doors, windows, outlents, plumbing, etc. and a price estimation. The system detects symbols specified by the user, like doors, electrical outlets, toilets, etc., counts them and compiles a list of detected objects along with their price.
Things like PDF tables, diagrams and various charts consist of a combination of text, special symbols and geometrical shapes, and can also be transformed from a PDF to an editable format in a matter of seconds. But the complex nature of these objects means that standard OCR tools, of which there are many, often can’t handle the task or do so very low accuracy. Custom OCR systems are the way to go in these cases.
One of the examples we have is a transformation of complex PDF spreadsheets into Excel ones for editing. Readymade solutions are not capable of handling complex spreadsheets as they often contain merged cells, data which is spread out across multiple PDF pages, etc. With the power of computer vision we have developed a system which accurately transforms such spreadsheets into an editable format in a matter of seconds.
Computer vision has become reliable enough to be used in the medical field. While the use of image recognition for detecting cancer or cell classification may still be in its beginning stages, computer vision is already widely used for more mundane, yet equally as important tasks.
Image recognition, along with text recognition, is used to augment traditional medical devices and interpret data received from them. For example, most glucometers - the devices which measure blood sugar levels - only provide a couple of numbers as output. Users don’t get the insights important for interpreting the results correctly, like their average blood sugar or the normal blood sugar levels to compare their results to.
Instead of buying an entirely new and more expensive device, a simple app powered by computer vision can provide necessary insights by analysing a photo of a screen of a medical device in question. We have worked on an app which analyses photos of glucometer’s screen and have seen just how powerful image recognition can be.
Computer vision allows to extract a multitude of attributes from a photo or a video, including dominant colors, objects present in the photo or video, etc. Paired with text generation, computer vision-powered systems can be a powerful marketing tool, and the data extracted can be used for further analysis and aid in decision making and building a marketing strategy.
Computer vision can be a part of a larger marketing system or can be used on its own to extract relevant data for further interpretation. Automatic caption generation systems, like the one we have developed for Instagram, use computer vision to detect the objects present in the photo, which are then used as a basis for the caption. Coupled with text generation algorithms, these systems can mimic human-written text to the point where it will be very hard to distinguish between the two. This greatly reduces marketing efforts and increases content production.
However, computer vision really shines when it comes to extracting data which is not obvious to the human eye, like dominant colors in a photo or video, for classification of visual data. Extracting such data makes it easier - or even possible - to classify and rank visual data which otherwise may seem completely inhomogeneous.
One of our most interesting projects is a development of a video analysing system which determines dominant colors in mobile game footage and compares them to common color schemes to determine how visually pleasing the game will be to the players.
Ranking different games according to how visually pleasant they are helps to determine how successful it will be commercially and which ones are worth investing money in. These conclusions are hard, if not impossible, to make when looking at footage with a naked eye, when all you can rely on is a ‘gut feeling’. Formalisation of tasks like these is impossible without computer vision.
Image recognition is widely used when working with documents, even in the most conservative establishments that require a high level of document security and accuracy, like banks and hospitals, which all now have implemented computer vision-based systems into their everyday document processing routines. These systems are often used to mitigate manual data entry and verify documents, for example, verify a signature on a cheque.
It may seem that the implementation of computer vision may only be accessible to large, well-established companies, like large banks or private hospitals, but it couldn’t be farther from the truth. Implementing computer vision into document processing routines is now more accessible and affordable than ever, while providing huge benefits in a form of saved labour on manual data entry as well as money savings, and increasing the accuracy.
Computer vision algorithms can analyse scanned documents and extract any data you may need. This makes it an excellent tool for document verification. We have seen this first hand when working on a checkpoint system for a company whose offices are located across multiple buildings. In order to check if a person has the required level of access to a particular building, their ID is scanned to verify their data is present in a database. Before, the employee had to manually type in the data which took a long time and a lot more effort.
Detection of objects on video, be it how many people have entered a shopping center or detecting when an industrial process has gone wrong, is still widely done by humans. This approach, while being the most straightforward one, is definitely not the most effective, cost efficient or reliable. Computer vision makes it possible to automate this task at a relatively low upfront cost, while greatly improving the detection accuracy and decreasing downtime, like lunch breaks and sick leaves.
Object detection systems are now accessible to anyone and their use is not constrained only to sci-fi movies. It is now used everywhere, from government surveillance and self-driving vehicles to monitoring simple processes like at-home 3D printing.
We have had an opportunity to develop a system that monitors the position of a safe waterway in ice-covered waters to help ships navigate them without excessive fuel losses which happen when the ship goes outside the safe waterway. The system receives a live video from one of the ship’s cameras as input and outputs the position of a waterway and plots a safe course.
Computer vision can be used to track more mundane processes than ships navigating through sea ice, but ones that still require constant supervision. One of them is at-home 3D printing, where prints can take up to several days, and an average print takes at least a couple of hours. Besides taking a long time to complete, the 3D printing process is very prone to going wrong or failing completely as there are a multitude of factors influencing the outcome of the print, like how high or low the ambient temperature is, the presence of drafts, the quality of the filament and many more. If something goes wrong, the printing process will result in a pile of randomly extruded plastic filament, also known as ‘spaghetti’.
3D printers are not equipped with tools that would allow them to detect when something goes wrong, making human supervision the only way to prevent waste of plastic filament. But as prints take hours to complete, monitoring it yourself is not a feasible idea, and that’s where computer vision comes into play.
We have participated in a development of a ‘spaghetti’ detection system that monitors a video stream of a printing process and detects when something goes wrong, alerting the user.
While there are expensive, complex visual data recognition systems out there, computer vision as a whole should not be treated as such. It is a tool that can both automate small everyday tasks, like text recognition, as well as be a part of complex systems used by the largest corporations. Gone are the days where machine learning in general, and computer vision in particular, were a rarity and available to a few - these days, anyone can take advantage of these technologies and use them to their benefit.
Having worked on numerous computer vision projects, we have seen just how powerful it can be and how it can transform businesses at a minimal cost. Using standard computer vision tools or using a more custom approach and developing a system from scratch - our team of machine learning experts are always looking for ways to use computer vision to its full potential and bring the most benefit to our clients.
If you are interested in developing your own computer vision system, drop us a line at firstname.lastname@example.org and we will get back to you.