Mass gatherings of people create problems in a variety of areas (retail, civil services, banks, developers). Long queues lead to long waiting times, dramatically decreasing the service quality, sabotaging loyalty, and tanking revenue in cases of retail.
Facial recognition technologies have come a long way and are now used to solve tasks like queue recognition in a quick and convenient way, without a need to invest a lot of funds. The most common way queue recognition is implemented is through using video surveillance cameras with built-in AI. Facial detection is done using the camera chipset, making recognition quick and highly effective.
Despite the AI camera market evolving, leading to a substantial decrease in cost, this approach may still not be the best - or even possible - for many instances. Many facilities already have hundreds of surveillance cameras installed which are operated using software that staff is already very familiar with. Installing a new set of cameras would not only cost a pretty penny but also involve the change in operational software, inconveniencing employees responsible for security. It makes much more sense to use an already established, well-functioning web of security cameras and improve upon it than to implement an entirely new solution.
In some cases, using cameras with built-in queue recognition is not feasible due to the specificity of the facility itself, the geographical area it is situated in, etc. For example, for areas with harsh winters AI cameras are often of no use due to the clothes people wear, like big hats and scarves, to protect themselves from the low temperatures. Protective equipment can also interfere with recognition.
Other limiting factors may include oddly shaped rooms with lots of obstacles blocking the camera's view, low light conditions, the presence of objects which are often mistaken for humans (like mannequins or posters with people on them), etc.
In all of these cases, the most convenient way of implementing queue recognition is by developing custom software that would take advantage of the cameras already installed in the facility while using UI which is familiar to security personnel. A custom AI solution is the only way for facilities with specific conditions to get accurate recognition results.
There are two approaches one may choose from when developing a queue detection system:
In both cases, there are a number of pre-trained models that can be used as a base for further improvement and adjusted depending on the conditions they will be used in.
The best architectures for the task of silhouette detection are YOLO v3 and Faster R-CNN. Tests have shown that Faster R-CNN has a slight edge over YOLO v3 for the task of recognizing people in queues.
Faster R-CNN model:
Dataset is one of the main components of effective object recognition and classification system. The quality of object recognition is tightly linked with how balanced and well-curated the dataset is and how well it represents the objects you need to recognize. The amount of data in a dataset is equally important - the more images of an object a recognition model 'sees', the higher the detection accuracy will be.
In cases of pre-trained models, like Faster R-CNN, there is no need to create your own dataset as the model is already trained to detect human silhouettes, meaning in ideal conditions the model is used as-is and no additional training, i.e. introducing new types of objects to the model, is needed.
However, there are a lot of cases in which an object recognition model needs to be trained additionally, for human detection:
Cases like these require additional training to force the model to learn what people look like or typically wear in this particular location.
Before creating a dataset, it is very important to understand not only the general task at hand — human silhouette recognition, but the specifics of the site, how the cameras are installed, and even what people usually wear during each season. This information will help in collecting a well-balanced dataset.
A good dataset used for object detection purposes typically includes at least 5000 marked-up images. Collecting over 5000 images of people specific to your location may be troublesome due to the sheer volume of photos one will have to find. There are multiple ways of going about this task:
It is essential that after initial training, the dataset needs to be evaluated to detect its weak points, for example, a low number of photos of people in helmets.
Model retraining on your custom dataset may still sometimes not be enough and recognition results still may not be up to industry standards. This is where it's usually time to turn to other, non purely ML-based methods.
The model, provided you have trained it on a well-balanced dataset, can only do so much, so when the results are not satisfactory, an entire approach needs to be changed. Previously, we have talked about detecting silhouettes to count how many people are in a queue. But in real-life queues, people stand close to each other, covering each other and obstructing the view, so the quality of recognition is often too low to use only the silhouette detection method.
Great improvements in detection accuracy can be achieved by changing the object we look for. Instead of looking for human silhouettes, t is better to look for people's heads. The best way of going about this is to train a Faster R-CNN model, or any model of your choice, on a dataset fit for this task. There are a number of datasets available that include pictures of large crowds - they are the best fit for this task.
Additional augmentation of a dataset may be needed, however, depending on the dataset you have chosen. Special attention needs to be given to how people in a given facility usually dress, as well as what they usually wear in different seasons. People in hats may be ignored by the model - a person's head with and without a hat look looks quite different, especially to a model that has only 'seen' hatless people before.
It's worth noting that when you need to detect a small number of people, silhouette detection is more preferable since it is much harder for the entire human body to be blocked by something rather compared to just blocking the head. However, in a crowded space, head detection is the only way to ensure the results are as accurate as possible. In this particular case, both silhouette and head recognition was used.
A queue detection system can be very useful for a business or a public institution as it allows to determine the busiest hours and allocate more staff and other resources in these times. The results can be presented in a form of ready-made reports with average and maximum occupancy for various time intervals. The results need to be presented in the form of graphs and diagrams that characterize the distribution of the number of people over time.
As a custom software development company, we have implemented numerous automation systems into many businesses, improving their operations and decreasing money losses. AI is a powerful tool for any business and can drastically change the way day-to-day tasks are solved.