Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Monitoring social distancing under various low light conditions with deep learning and a single motionless time of flight camera

Abstract

The purpose of this work is to provide an effective social distance monitoring solution in low light environments in a pandemic situation. The raging coronavirus disease 2019 (COVID-19) caused by the SARS-CoV-2 virus has brought a global crisis with its deadly spread all over the world. In the absence of an effective treatment and vaccine the efforts to control this pandemic strictly rely on personal preventive actions, e.g., handwashing, face mask usage, environmental cleaning, and most importantly on social distancing which is the only expedient approach to cope with this situation. Low light environments can become a problem in the spread of disease because of people’s night gatherings. Especially, in summers when the global temperature is at its peak, the situation can become more critical. Mostly, in cities where people have congested homes and no proper air cross-system is available. So, they find ways to get out of their homes with their families during the night to take fresh air. In such a situation, it is necessary to take effective measures to monitor the safety distance criteria to avoid more positive cases and to control the death toll. In this paper, a deep learning-based solution is proposed for the above-stated problem. The proposed framework utilizes the you only look once v4 (YOLO v4) model for real-time object detection and the social distance measuring approach is introduced with a single motionless time of flight (ToF) camera. The risk factor is indicated based on the calculated distance and safety distance violations are highlighted. Experimental results show that the proposed model exhibits good performance with 97.84% mean average precision (mAP) score and the observed mean absolute error (MAE) between actual and measured social distance values is 1.01 cm.

Introduction

COVID-19 belongs to the family of coronavirus caused diseases, firstly reported in Wuhan, China at the end of December 2020. China has announced its first death from the virus on January 11, a 61 years old man. On March 11, World Health Organization (WHO) [1, 2] declared it pandemic due to its spread over 114 countries with a death toll of 4000 and active cases of 118000 [3]. Data from Johns Hopkins University showed that more than seven million people were confirmed to have the coronavirus with at least 406,900 dying from the disease on June 8. Several health organizations, scientists, and doctors tried to develop vaccine but no success is observed so far. This situation forces the world to find out an alternative solution to avoid drastic results. Lockdown was imposed globally and maintaining safe social distance is reported to be the alternate solution to cope with this drastic situation. The term social distancing is the best idea in the regulation of efforts made to minimize the spread of COVID-19 [4]. The basic objective is to reduce the physical contact between the infected and the healthy people. As prescribed by WHO, people should maintain at least 1 meter (m) distance from each other to control the spread of this disease [1, 5, 6].

This paper aims to mitigate the effects of coronavirus disease along with minimum loss of resources; this disease has badly impacted the global economy. Secondly, to provide a highly accurate solution for the detection of people to help out in monitoring social distancing during the night. Especially, in summer when the heat is at its peak, people having congested homes find ways to get out of their homes during the night with their families to take fresh air. During this serious situation, it is necessary to take proper action. Recently, Eksin et al. [7] evaluated the susceptible infected recovered (SIR) model where they included a social distancing term. They showed that the spread of disease depends upon people’s social behavior. They assessed the results of the SIR model with and without behavior change factor and found that a simple SIR model did not get well performance even after many repeated observations; whereas, their updated SIR model with behavior change factor showed good results and corrected the initial error rate. In a similar context, a landing AI [8] company has declared the development of an AI tool for monitoring social distance in the working area. In a short report [8], the firm professed that the prospective tool will be able to observe people, whether they are following safety distance criteria by examining real-time video streams captured by the camera. They affirmed that this tool can be easily combined with available security cameras in different working areas to ensure a safe distance between workers. The world-leading research company Gartner Inc. [9] declared landing AI as cool vendors in AI core technologies to acknowledge their timely incentive to support the fight against the deadly situation of COVID-19 [10].

In this article, a deep learning-based solution is proposed for the automatic detection of people and monitoring social distance in low light environments. The first contribution of this article is the performance evaluation of YOLO v4 on low light conditions without applying any image cleansing approaches. As in past low light environments are not much focused, few have focused the problem but only in the context of enhancing low light scenarios and improving visibility [1114]; whereas, in the real-time object detection and monitoring, this approach is not feasible because it takes more time to enhance low light scenarios at first place and then apply object detection techniques. So, the real-time application should have to give a timely response with high accuracy. Secondly, a social distance monitoring solution is proposed by considering precise speed-accuracy tradeoff and is evaluated on our custom dataset. From experimental results, it is observed that the model exhibited good performance with a balanced mAP score and MAE [15] of 1.01 cm.

Related work

In this section, we briefly introduce previous work done on the social distancing in the context of the 2019 novel coronavirus disease. As the disease spread at the end of December, researchers started work to pay their contributions in the deadly situation. Social distancing was suggested as the alternative solution. The different research studies were conducted to provide an effective social distancing solution. In the same background, Prem et al. [16] studied the consequences of social distancing measures on the progression of the COVID-19 epidemic in Wuhan, China. They used synthetic location-specific contact patterns to imitate an ongoing trajectory outbreak using age structure susceptible-exposed-infected removed (SEIR) models for several social distancing measures. They interpreted that a sudden rise in interventions will lead to an early secondary peak but it will flatten gradually with time. As we all can understand social distancing is important to cope with the current situation but economically it is a drastic measure to flatten the curve against infectious diseases. Adolph et al. [17] emphasized the situation of USA where they gathered state-level responses regarding social distancing and found the contradiction in the decision among policymakers and politicians which causes a delay in imposing the social distancing strategies resulting in ongoing harm to public health. On the brighter side, social distancing helped a lot to control the spread of disease but it has also affected economic productivity. In the same background, Kylie et al. [18] have studied the association between transmissibility and social distancing and found that association decreases as transmissibility decreases within different provinces of China. According to the study, the intermediate level of activity could be allowed while avoiding an immense outbreak.

Since the COVID-19 pandemic began, many countries are seeking for technology-oriented solutions. Asian countries have used a range of technologies to fight against COVID-19. The most used technology is tracking location by phones where the data of COVID-19 positive people are saved, based on this data their near about healthy people are monitored. Germany and Italy are using anonymized location data to monitor lockdown. UK has launched an application (app) named C9 corona symptom tracker [19] that helps people to report their symptoms. Similarly, South Korea launched an app named Corona 100m [19] that has stored the location of infected people and generate alert to healthy people when they came near to corona patients at a distance of 100m. India has developed an app that helps people to maintain a specific distance from a person who has tested corona positive. Besides this, India, South Korea, and Singapore are taking benefit from CCTV footage [19] to monitor the recently visited places of COVID-19 patients to track down the infected people. China is utilizing AI-powered thermal cameras [19] to identify those people in the crowd having the temperature. Such inventions in this drastic situation might help to flatten the curve but at the same time, it results in a threat to the personal information.

Object detection helped a lot in this deadly situation. Many of the researchers have investigated the situation [2023] to detect various types of objects to help out the scenario. Human detection [2427] is an established area of research. Recent advancements in this field [28, 29] had created the demand for intelligent systems to monitor unusual human activities. Despite the fact, human detection is an interesting field because of many reasons like faint videos, diverse articulated pose, background complexities, and limited machine learning capabilities; hence, existing knowledge can boost the detection performance [20]. Narinder et al. [21] motivated by the notion of social distancing proposed a deep learning-based structure to automate the task of observing social distance using surveillance video [22]. They used YOLO v3 [30] algorithm with a deep-sort technique for the separation of people from the background and tracking of detected people with the help of bounding boxes. Cob et al. [23] investigated the relation of COVID-19 growth rates in US with shelter in place orders (SIP). They presented a random forest machine learning model for their predictions and found the SIP orders very effective. Their study showed that SIP orders will not only be helpful for the US but also will help highly populated countries to reduce the COVID-19 growth rate. Deep learning is the popular area to perform object detection which gained a huge interest in the modern research field. Deep learning techniques have successfully applied in the drastic situation of COVID-19 by automating the task of face mask detection [31], detection of COVID-19 cases with X-ray images [32], lung infection measurement in CT images [33], COVID-19 patients monitoring [34] and most importantly monitoring social distancing [2023].

Different research studies were conducted to provide a better and effective social distance monitoring solution as we discussed above but no one has focused on the low light environments. Besides this, we have not found any real-world unit distance mapping solution. To fillup this research gap, this article mainly focuses on low light conditions and to come up with a real-world unit distance mapping strategy that simplifies social distance monitoring tasks to help out in this deadly situation.

Background of deep learning models

Several deep learning algorithms are available and every newly developed algorithm has resolved the problems of the previous one in some way. Conventional object detection algorithms use classifier based procedure, where the classifier runs on a slice of the image in sliding window fashion, this is how Deformable Parts Model (DPM) [35] works. In R-CNN ancestry (R-CNN [36], Fast R-CNN [37] and Faster R-CNN [38]) classifier run on region proposals that are considered as bounding boxes. These algorithms exhibit good performance, especially Faster R-CNN with an accuracy of 73.2% mAP, but because of their intricate pipeline, they show poor performance in the context of speed with 7 frames per second (FPS), which limit them for real-time object detection.

This is where YOLO fits, a real-time object detection system with a creative perspective of reviewing object detection as a regression problem was introduced in 2016 by Joseph et al. [39]. YOLO exhibits good performance as compared to previous region-based algorithms in terms of speed with 45 FPS by maintaining good detection accuracy of 63.4% mAP. Despite good speed and performance, YOLO made notable localization errors. Moreover, YOLO has low recall. To resolve the shortcomings of YOLO, in the same year authors of YOLO released YOLO second version where recall and localization were mainly focused without affecting classification accuracy. YOLO v2 [40] gained a speed of 67 FPS and mAP reached 76.8%. YOLO v2 is also called YOLO 9000 because of its ability to detect objects of more than 20 classes by mutually optimizing classification and detection. The YOLO v3 [30] developed in 2018 brought new improvements in speed and accuracy, but the main idea remained the same.

Recently YOLO v4 is released by Alexey et al. [41]. In comparison with its direct predecessor YOLO v3, average precision (AP) and FPS increased by 10 to 12 percent. In experiments on the MS COCO [42] dataset, it obtained 43.5% AP score and achieved a real-time speed of approximately 65 FPS on Tesla V100, vanquishing over the most accurate and fastest detectors in terms of both accuracy and speed. Most of the detectors require multiple GPUs for training with a large batch size; whereas, training on a single GPU makes the training process very slow. YOLO v4 resolved this issue by presenting a fast and accurate object detector that can be trained with a smaller batch size on a single GPU. Below we have briefly described the architecture of general object detectors and the newly introduced YOLO v4 model.

General architecture of object detector

Ordinary object detectors like R-CNN, Fast R-CNN, and Faster R-CNN are two-stage detectors made up of three parts: backbone, neck, and head.

  • Backbone: Models like VGG [43], DenseNet [44], and ResNet [45] are used as feature extractors, first trained on image classification dataset, and then fine-tuned on detection dataset. These networks construct different levels of features which will result in a deeper network and be useful for prior parts of object detection networks.
  • Neck: Extra layers lie between backbone and head which will be helpful for feature map extraction from previous backbone stages. Different feature map extraction techniques are used, e.g., YOLO v3 uses Feature Pyramid Network (FPN) [46] for extraction of feature maps of different scales from the backbone, where every next layer gets in input the merged results of previous layers and produces different levels of the pyramid. Classification/ regression (head) is applied on every pyramid level which helps in the detection of different sizes of objects.
  • Head: This is responsible for assigning a class to objects and generating bounding boxes around it (classification and regression). One stage detectors like YOLO apply classification/regression to each anchor box.

YOLO v4 architecture

In this section, we discuss YOLO v4. Fig 1 shows a diagrammatic representation of YOLO v4 architecture.

  • Backbone: It employs CSPDarknet53 as a feature extractor with a graphics processing unit (GPU). Few backbones are more appropriate for classification than for detection. For example, CSPResNext50 is better than CSPDarknet53 for image classification; whereas, CSPDarknet53 is proved better in terms of object detection. For better detection of small objects, the backbone model needs a higher network size as an input and for higher receptive fields more layers are required.
  • Neck: For feature map extraction, it uses Path Aggregation Network (PAN) and Spatial Pyramid Pooling (SPP). PAN used in YOLO v4 is the modified version of the original PAN where addition is replaced with concatenation. In the original version after minimizing N4 size to get the same spatial size of P5, they summed this new depleted N4 with P5. This reoccurs at all layers of Pi+1 to create Ni+1. In YOLO v4 rather than adding Ni with each Pi+1, they concatenated them. If we glass over SPP it mainly does max-pooling over 19 × 19 × 512 feature map distinct kernel sizes k = 5, 9, 13 with the same padding to keep the spatial size same. Four feature maps are merged to form 19 × 19 × 2048 magnitude. This increases the neck receptive field with improvement in the model’s accuracy and minimal rise of inference time.
  • Head: YOLO v4 utilizes the same head as YOLO v3 with the anchor-based detection steps.

YOLO v4 performance optimization.

The authors of YOLO v4 differentiated between two types of methods that are used to improve object detector’s accuracy. They examined both types of methods to obtain fast operating speed with high accuracy. Both types are as follows:

  • Bag of Freebies (BoF): Procedure that produces an object detector that delivers better accuracy without increasing inference cost. One of its examples is data augmentation, the model trained on small datasets has poor generalization ability which leads these models towards overfitting. Overfitting is the problem that usually arises when a deep neural network tries to learn the most frequently occurring pattern. As several methods were proposed to resolve the problem of overfitting. Data augmentation [47] is from one of those methods, by utilizing it we can reduce overfitting on the models. Many data augmentation techniques are available like brightness alteration, disparity, noise, and saturation or we can do geometric twisting like cropping, rotating, and flipping. Some other bag of freebies include regularization approaches to avoid overfitting. Conventional regression technique is mean squared error (MSE) [48], the mean of the sum of squared difference between observed and true values as described in Eq (1). (1) MSE treats variables as self-sufficient rather than unified. To surpass this, IoU [49] loss is proposed, which takes into account the area of the ground truth bounding box and predicted bounding boxes (BBox). This notion is further enhanced by GIoU [50] loss by adding orientation and shape of an object with the area. Besides GIoU, CIoU is introduced which takes into account overlaying area, aspect ratio, and distance between center points. YOLO v4 uses CIoU loss for bounding boxes, because of its good performance and faster convergence.
  • Bag of Specials (BoS): Those elements and post-preprocessing techniques that only increase a small amount of inference cost but bring notable improvement in the object’s detection accuracy. YOLO v4 considers the modified Spatial Attention Module (SAM) [51]. In SAM instead of using max and average pooling, the feature map is passed through a convolutional layer with a sigmoid activation function and then multiplied to the original feature map. YOLO v4 uses the Mish activation function in the backbone as described in Eq (2). E.g., using Mish with Squeeze Excite Network [52] on the CIFAR100 dataset improves accuracy by 0.494% and 1.671% in comparison with the same network where ReLU and Swish were used [53]. (2)

Materials and methods

Training dataset

In this paper, to tune up the object detection model for human detection under various low light conditions, a recently released ExDARK dataset [54] is considered which specifically focuses on a low-light environment. In this dataset, 12 different classes of objects are labeled, out of which we fetched data of our desired class for training. This dataset contains different indoor and outdoor low light images; furthermore, the data is subdivided for low light environment into 10 classes ambient, object, strong, twilight, low, weak, screen, window, shadow, and single. Sample images of various indoor-outdoor low-light environments from the dataset are shown in Fig 2.

thumbnail
Fig 2. Example of low-light image types in the ExDARK dataset.

https://doi.org/10.1371/journal.pone.0247440.g002

Testing dataset

A custom dataset is used for the evaluation of the proposed model. The dataset is collected from the market of Rawalpindi, Pakistan during the night in the days of COVID-19. Pakistan is one of the most urbanized countries in South Asia with a 3% yearly urban population growth rate. The large population and congested streets make it a riskier place in the growth of COVID-19 and it is very difficult to maintain safety distance in such narrow places. Hence, the monitoring system should need to have high accuracy in terms of the detection and location of the people. Evaluation of the proposed framework in such a highly-populated area will help us to better analyze the performance of the model. Test dataset is the collection of 346 RGB frames. Frames are collected with motionless ToF camera of Samsung galaxy note 10+ installed 4.5 feet above the ground where a 0° regular camera view calibration is adopted. Sample images of low-light conditions from the custom dataset are shown in Fig 3.

Monitoring social distancing with deep learning and a single motionless time of flight (ToF) camera

The emergence of deep learning has caught much attention and became a presiding technology that introduced a variety of techniques to solve different challenges including self-driving [55], fraud detection [5658], robotics [59], language translations [60], medical diagnosis [61], and many more [62]. Most of these challenges revolve around object detection, classification, segmentation, recognition, and tracking, etc.

In this research article, a deep learning-based solution is proposed that uses an object detection model for automating the task of social distance monitoring at fixed camera distance (Cd) under various low light environments. To monitor social distance at Cd motionless ToF [63] camera is utilized along with the YOLO v4 algorithm to maintain speed-accuracy tradeoff.

ToF cameras give real-time distance images which simplify human monitoring tasks. These cameras utilize light pulses. The light of the camera is switched on for a short time interval and the resultant light pulse brightens the scene and comes back by striking the object. This reflected light encounters a reflection delay depending on the distance of the object. The camera lenses assemble the incoming light and create an image on the sensor. ToF camera to object distance is calculated by Eq (3). (3) Where SL is the speed of light, Lp is the length of the pulse, S1 is gathered charge when light is emitted and S2 represents the charge when there is no light emission. The view V captured by ToF camera is the three tuple value V = (F, TD, Cp), where F is an RGB frame with height and width, TD is a safe distance threshold value, and Cp shows camera position in real world environment. In a given V we are eager to find number of people po = (p1, p2, p3, …, pn) and their self-distance where ED ∈ ℜ+ and pn is overall people detected in one frame. We are also keen to find the value of safety threshold TD to monitor safety distance violations (PD < TD|PD = TD|PD > TD).

People detection in F by deep learning

For the detection of objects, the YOLO v4 model is trained on the ExDARK dataset. We trained our model on two different network sizes (320 × 320 and 416 × 416) and evaluate the performance in both cases. The model trained on 416 × 416 network size shows the highest mAP value as shown in Table 1. The trained model Tm = (BBi, CLi, CSi) is tuple of three values, where BBi shows bounding boxes coordinates of detected po in F, BBi = (Xmini, Ymini, Xmaxi, Ymaxi), class labels CLi, and confidence score CSi, ∀i ∈ {1, 2, 3, …, n}. We have created list of all center points CPi of detected BBi in F, CPi = {(x1, y1), (x2, y2), …, (xn, yn)}.

thumbnail
Table 1. Model’s performance evaluation using COCO detection metrics at different IoU threshold.

https://doi.org/10.1371/journal.pone.0247440.t001

Specifying TD in F

The considered safety threshold value to control the spread of disease is 100 cm as specified by WHO [1]. For initializing the monitoring process we have placed two temporary targets (T1, T2) in the real-world environment with the actual self distance DT1T2 of 100 cm at Cd and capture image. The captured image is passed to Tm and calculated Euclidean distance Ed between CPi of detected bounding boxes by Eq (4). The calculated Ed gives us distance between T1 and T2 in F in the form of pixels which is equivalent to real-world unit distance DT1T2. This Ed will be used as a threshold value to filter newly coming people in the V. The environmental arrangement of ToF camera with target objects T1, T2, and safety threshold distance DT1T2 is shown in Fig 4. (4)

thumbnail
Fig 4. Environmental setup of motionless ToF camera based social distance monitoring at fixed camera distance Cd where T1 and T2 are target objects placed in environment to initialize monitoring process.

https://doi.org/10.1371/journal.pone.0247440.g004

Pixels to real-world unit distance mapping

To convert TD from pixel distance to unit distance (cm) we found that TD is directly proportional to DT1T2 as described in Eq (5). (5) Here k is the constant which represents one pixel which is equivalent to units. We convert the distance between the center points of newly coming objects at Cd in V into units by Eq (6). (6) Where Dui is measured distance in units, k is constant which stores pixel to unit equivalent value, and PD is the Euclidean distance between the CPi of all detected persons in F. The workflow of the proposed model is shown in Fig 5.

Experiments & results

Experimental setup

In the ExDARK image classification experiment, the selection of hypermeters are as follows: training steps are 35000 and 50000 at two different network sizes 320 and 416; batch size and subdivisions are 64 and 16; the polynomial decay learning rate scheduling strategy is adopted with an initial learning rate 0.001; the warm-up steps are 1000; Momentum and weight decay of 0.949 and 0.0005 respectively. From a bag of freebies (BoF) mosaic data augmentation technique is utilized. From the bag of specials (BoS) mish and leaky- ReLU [64] activation functions are used. The network size is 320 × 320 and 416 × 416 with 3 channels and the initialized IoU threshold for ground truth allocation is 0.213. The IoU normalizer is 0.07 and CIoU loss is used for bounding boxes. To cut off a large number of rectangular boxes and choose the best one greedy non-maximum suppression (NMS) is used. The experiments are done on Tesla T4 GPU with 16 GB memory, CUDA v10010, and cuDNN v7.6.5.

Evaluation standards

Common evaluation indicators for object detectors are Precision, Recall, and AP. The subsequent explains the purpose of these indicators in the context of person detection under various low light conditions. Precision shows how accurately the model has predicted the people. Recall is described as the number of truly detected people over the sum of truly detected people and undetected people in the image. AP is the mean of the precision score after every true object is detected as shown in Eq (7). It comprehends the performance of the object detection algorithms. Having extensive assessment ability AP is used as an assessment indicator in this research which is equivalent to mAP in COCO detection metrics [42]. (7)

Performance evaluation

By performing a series of experiments, we evaluate the performance of the trained model by COCO detection metrics. Table 1 shows precision (Prec), recall (Rec), F1-score, false positives (FP), true positives (TP), false negatives (FN), and mAP at two different network sizes (320, 416) with IoU threshold 0.5, 0.75 and 0.5:0.95. To calculate precision and recall we use the TP, FP, and FN as shown in Eqs (8) and (9) whereas, F1-score is calculated by the resultant values of precision and recall as described in Eq (10). By summarizing the evaluation results based on the mAP, we can see that the model exhibited overall good performance, network size 416 with IoU threshold 0.5 have the highest mAP value of 97.84%. The Precision-recall curve (PR-curve) of COCO evaluation at the IoU threshold ranges from 0.5 to 0.95 at two network sizes is shown in Fig 6. (8) (9) (10)

thumbnail
Fig 6. COCO evaluation, the IoU threshold ranges from 0.5 to 0.95 with a step size of 0.05.

https://doi.org/10.1371/journal.pone.0247440.g006

Detection results

We have tested our trained model on a custom dataset. Detection results per frame extracted from the video are shown in Fig 7. Table 2 shows TP, FP, FN, precision, and recall values for detected objects per frame. The model exhibited overall good performance in low light environments, from Table 2 it can be observed that no false positive is detected in any of the frames; whereas, the number of false-negatives is also low. PR-curve from precision-recall values of Table 2 is shown in Fig 8, we noticed that the precision values remained constant from Frame1 to Frame15.

thumbnail
Fig 7. Visualization of classification and localization results of YOLO v4.

https://doi.org/10.1371/journal.pone.0247440.g007

thumbnail
Table 2. YOLO v4 performance evaluation results towards real-time person detection under various low light conditions from Fig 7.

https://doi.org/10.1371/journal.pone.0247440.t002

Experimental results

To evaluate the performance of our social distance monitoring solution, we perform few tests at three different fixed camera distances 400 cm, 500 cm, and 600 cm. Test frames are collected from the motionless ToF camera of Samsung galaxy note 10+ placed 4.5 feet above the ground where Cp is 0° (a regular camera view). At each specific fixed camera distance, we tested 2 scenarios one above the specified safety threshold (100 cm) at 140 cm and one below the specified safety threshold at 52 cm. Qualitative results are shown in Fig 9; whereas, Table 3 shows the quantitative results in terms of the distance between objects in pixels and cm, actual known distance in cm, and per test error rate. We can see that model exhibited overall good performance. People violating the safety distance are highlighted by red bounding boxes; whereas, green bounding boxes show people following safety distance criteria. The Absolute Error (AE) is calculated for all tests, between actual distance in units (Ad) and measured distance in units (Du) by using Eq (11) and based on AE mean absolute error (MAE) is calculated by Eq (12). The Ad and Du plot is shown in Fig 10, where the blue color shows the actual known distance in cm and the red line shows the measured distance in cm. (11) (12)

thumbnail
Fig 9. Test visualizations of our social distance monitoring approach at various Cd values.

(a) Cd = 400 cm (b) Test 1 (c) Test 2 (d) Cd = 500 cm (e) Test 3 (f) Test 4 (g) Cd = 600 cm (h) Test 5 (i) Test 6.

https://doi.org/10.1371/journal.pone.0247440.g009

thumbnail
Fig 10. Graph plot of measured vs actual object distance values from Table 3 to highlight monitored error rate.

https://doi.org/10.1371/journal.pone.0247440.g010

thumbnail
Table 3. Social distancing measure tests at different Cd values as shown in Fig 10.

Where PD is calculated distance in pixels, Du is measured distance in cm and Ad is actual distance in cm.

https://doi.org/10.1371/journal.pone.0247440.t003

Limitations and discussion

This application is meant to be used in a real-time environment so, precision and accuracy are highly required to serve the motive. The proposed model shows efficient results during the evaluation of the YOLO v4 model in low light conditions where no single FP is detected, as the accuracy and reliability of the model is highly dependent on FP. To evaluate the performance of the social distance monitoring strategy few Tests are performed, as shown in Table 3. The proposed deep learning and motionless ToF camera-based social distance monitoring technique at Cd shows a good speed-accuracy tradeoff in monitoring social distancing during the night. The technique is limited to a few scenarios, social distance among people can be only monitored at fixed Cd values. Secondly, in order to initialize the monitoring process, we have to place two temporary target objects in an environment.

Conclusion

This article proposes an efficient solution for real-time social distance monitoring in low light environments. For real-time person detection, the YOLO v4 algorithm is trained on the ExDARK dataset. For monitoring social distance, a motionless ToF camera is used to observe people at fixed camera distance and show resultant distance in real-world units. Safety distance violations are highlighted. The proposed YOLO v4 based real-time social distance monitoring solution is evaluated by COCO detection metrics. Experimental analysis shows that the YOLO v4 algorithm achieved the best results in different low light environments with 97.84% mAP score and the observed MAE value during the test of our social distance monitoring approach is 1.01 cm. The FPS score can be more enhanced by fine-tuning the same approach on GPUs like Volta, Tesla V100, or Titan Volta.

The proposed technique can be easily applied in real-world scenarios because of high precision and the low error rate, e.g., in banks to help the cashier to monitor people standing in front of him, in shops to help shopkeepers to observe customers, in train stations to help ticket giver to keep track of people violating safe distance, etc. In the future, we will extend our system to monitor social distance at varying camera distances by managing objects varying camera angles.

References

  1. 1. WHO. Who director-generals opening remarks at the media briefing on covid-19-11 march 2020;. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.
  2. 2. Organization WH. WHO corona-viruses (COVID-19);. https://www.who.int/health-topics/coronavirus.
  3. 3. Roosa K, Lee Y, Luo R, Kirpich A, Rothenberg R, Hyman JM, et al. Short-term forecasts of the COVID-19 epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. Journal of clinical medicine. 2020;9(2):596. pmid:32098289
  4. 4. McAleer M. Is One Diagnostic Test for COVID-19 Enough?; 2020.
  5. 5. Setti L, Passarini F, De Gennaro G, Barbieri P, Perrone MG, Borelli M, et al. Airborne transmission route of COVID-19: why 2 meters/6 feet of inter-personal distance could not Be enough; 2020. pmid:32340347
  6. 6. da Cunha de Sá-Caputo D, Taiar R, Seixas A, Sanudo B, Sonza A, Bernardo-Filho M. A Proposal of Physical Performance Tests Adapted as Home Workout Options during the COVID-19 Pandemic. Applied Sciences. 2020;10(14):4755.
  7. 7. Eksin C, Paarporn K, Weitz JS. Systematic biases in disease forecasting–The role of behavior change. Epidemics. 2019;27:96–105. pmid:30922858
  8. 8. ALTO P. Landing AI Named an April 2020 Cool Vendor in the Gartner Cool Vendors in AI Core Technologies;. https://www.yahoo.com/lifestyle/landing-ai-named-april-2020-152100532.html.
  9. 9. Hall EA. Gartner;. https://www.gartner.com/en.
  10. 10. AI L. Landing AI Named an April 2020 Cool Vendor in the Gartner Cool Vendors in AI Core Technologies;. https://www.prnewswire.com/news-releases/.
  11. 11. Guo X, Li Y, Ling H. LIME: Low-light image enhancement via illumination map estimation. IEEE Transactions on image processing. 2016;26(2):982–993.
  12. 12. Lore KG, Akintayo A, Sarkar S. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition. 2017;61:650–662.
  13. 13. Li M, Liu J, Yang W, Sun X, Guo Z. Structure-revealing low-light image enhancement via robust retinex model. IEEE Transactions on Image Processing. 2018;27(6):2828–2841. pmid:29570085
  14. 14. Ren W, Liu S, Ma L, Xu Q, Xu X, Cao X, et al. Low-light image enhancement via a deep hybrid network. IEEE Transactions on Image Processing. 2019;28(9):4364–4375. pmid:30998467
  15. 15. De Myttenaere A, Golden B, Le Grand B, Rossi F. Mean absolute percentage error for regression models. Neurocomputing. 2016;192:38–48.
  16. 16. Prem K, Liu Y, Russell TW, Kucharski AJ, Eggo RM, Davies N, et al. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study. The Lancet Public Health. 2020;. pmid:32220655
  17. 17. Adolph C, Amano K, Bang-Jensen B, Fullman N, Wilkerson J. Pandemic politics: Timing state-level social distancing responses to COVID-19. medRxiv. 2020.
  18. 18. Ainslie KE, Walters CE, Fu H, Bhatia S, Wang H, Xi X, et al. Evidence of initial success for China exiting COVID-19 social distancing policy after achieving containment. Wellcome Open Research. 2020;5. pmid:32500100
  19. 19. G Seetharaman EB. How countries are using technology to fight coronavirus;. https://economictimes.indiatimes.com/tech/software/how-countries-are-using-technology-to-fight-coronavirus/articleshow/74867177.cms.
  20. 20. Wang X. Intelligent multi-camera video surveillance: A review. Pattern recognition letters. 2013;34(1):3–19.
  21. 21. Punn NS, Sonbhadra SK, Agarwal S. Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and Deepsort techniques. arXiv preprint arXiv:200501385. 2020.
  22. 22. Sulman N, Sanocki T, Goldgof D, Kasturi R. How effective is human video surveillance performance? In: 2008 19th International Conference on Pattern Recognition. IEEE; 2008. p. 1–3.
  23. 23. Cobb J, Seale M. Examining the effect of social distancing on the compound growth rate of SARS-CoV-2 at the county level (United States) using statistical analyses and a random forest machine learning model. Public Health. 2020;. pmid:32526559
  24. 24. Ko BC, Jeong M, Nam J. Fast human detection for intelligent monitoring using surveillance visible sensors. Sensors. 2014;14(11):21247–21257. pmid:25393782
  25. 25. Kim JH, Hong HG, Park KR. Convolutional neural network-based human detection in nighttime images using visible light camera sensors. Sensors. 2017;17(5):1065. pmid:28481301
  26. 26. Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer; 2006. p. 428–441.
  27. 27. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). vol. 1. IEEE; 2005. p. 886–893.
  28. 28. Yin F, Li X, Peng H, Li F, Yang K, Yuan W. A highly sensitive, multifunctional, and wearable mechanical sensor based on RGO/synergetic fiber bundles for monitoring human actions and physiological signals. Sensors and Actuators B: Chemical. 2019;285:179–185.
  29. 29. Chaaraoui AA, Padilla-López JR, Ferrández-Pastor FJ, Nieto-Hidalgo M, Flórez-Revuelta F. A vision-based system for intelligent monitoring: human behaviour analysis and privacy by context. Sensors. 2014;14(5):8895–8925. pmid:24854209
  30. 30. Redmon J, Farhadi A. Yolov3: An incremental improvement. arXiv preprint arXiv:180402767. 2018.
  31. 31. Rosebrock A. COVID-19: Face Mask Detector with OpenCV, Keras/TensorFlow, and Deep Learning;. https://www.pyimagesearch.com/2020/05/04/covid-19-face-mask-detector-with-opencv-keras-tensorflow-and-deep-learning/.
  32. 32. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Computers in Biology and Medicine. 2020; p. 103792. pmid:32568675
  33. 33. Shan F, Gao Y, Wang J, Shi W, Shi N, Han M, et al. Lung infection quantification of covid-19 in ct images with deep learning. arXiv preprint arXiv:200304655. 2020.
  34. 34. Gozes O, Frid-Adar M, Greenspan H, Browning PD, Zhang H, Ji W, et al. Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis. arXiv preprint arXiv:200305037. 2020.
  35. 35. Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE; 2008. p. 1–8.
  36. 36. Girshick R, Donahue J, Darrell T, Malik J. Region-based convolutional networks for accurate object detection and segmentation. IEEE transactions on pattern analysis and machine intelligence. 2015;38(1):142–158.
  37. 37. Girshick R. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision; 2015. p. 1440–1448.
  38. 38. Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems; 2015. p. 91–99.
  39. 39. Redmon Joseph G RB Divvala Santosh Kumar, Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. CoRR. 2015;abs/1506.02640.
  40. 40. Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 7263–7271.
  41. 41. Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:200410934. 2020.
  42. 42. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: European conference on computer vision. Springer; 2014. p. 740–755.
  43. 43. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014.
  44. 44. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 4700–4708.
  45. 45. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–778.
  46. 46. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2117–2125.
  47. 47. Wang J, Perez L, et al. The effectiveness of data augmentation in image classification using deep learning. Convolutional Neural Networks Vis Recognit. 2017;11.
  48. 48. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research. 2005;30(1):79–82.
  49. 49. Yu J, Jiang Y, Wang Z, Cao Z, Huang T. Unitbox: An advanced object detection network. In: Proceedings of the 24th ACM international conference on Multimedia; 2016. p. 516–520.
  50. 50. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S. Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. p. 658–666.
  51. 51. Woo S, Park J, Lee JY, So Kweon I. Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 3–19.
  52. 52. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 7132–7141.
  53. 53. Misra D. Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:190808681. 2019.
  54. 54. Loh YP, Chan CS. Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding. 2019;178:30–42.
  55. 55. Rao Q, Frtunikj J. Deep learning for self-driving cars: chances and challenges. In: Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems; 2018. p. 35–38.
  56. 56. Roy A, Sun J, Mahoney R, Alonzi L, Adams S, Beling P. Deep learning detecting fraud in credit card transactions. In: 2018 Systems and Information Engineering Design Symposium (SIEDS). IEEE; 2018. p. 129–134.
  57. 57. Pumsirirat A, Yan L. Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. International Journal of advanced computer science and applications. 2018;9(1):18–25.
  58. 58. Wang Y, Xu W. Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decision Support Systems. 2018;105:87–95.
  59. 59. Pierson HA, Gashler MS. Deep learning in robotics: a review of recent research. Advanced Robotics. 2017;31(16):821–835.
  60. 60. Dong D, Wu H, He W, Yu D, Wang H. Multi-task learning for multiple language translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); 2015. p. 1723–1732.
  61. 61. Bakator M, Radosav D. Deep learning and medical diagnosis: A review of literature. Multimodal Technologies and Interaction. 2018;2(3):47.
  62. 62. Deng L. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing. 2014;3.
  63. 63. Li L. Time-of-flight camera–an introduction. Technical white paper. 2014;(SLOA190B).
  64. 64. Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:150500853. 2015.