| CPC G06V 20/54 (2022.01) [G06V 10/751 (2022.01); G06V 10/764 (2022.01); G06V 10/766 (2022.01); G06V 10/7715 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/44 (2022.01); G06V 20/53 (2022.01); G06V 40/172 (2022.01)] | 20 Claims |

|
1. A method, comprising:
receiving, by a device, surveillance video data captured by a surveillance system in a geographic location;
processing, by the device, the surveillance video data, with a deep learning model, to identify objects in the geographic location;
utilizing, by the device, a segmentation guided attention network model with the objects to determine traffic density count data in the geographic location;
processing, by the device, the segmented video frame data, with a regression analysis model, to derive traffic signal timing in the geographic location;
utilizing, by the device, a curriculum loss model with the objects to determine crowd count data in the geographic location;
processing, by the device, the surveillance video data, with a deep learning video analytics model, to identify first events associated with emergency vehicles and accidents in the geographic location;
processing, by the device, the surveillance video data, with a classifier model and a deep network model, to identify second events associated with facial recognition in the geographical location;
processing, by the device, one or more of the objects, the traffic density count data, the crowd count data, the first events, or the second events, with a dynamic text-based explanation model, to generate a layer-wise explanation and a text-based explanation and/or a failure prediction for one or more of the regression analysis model, the deep learning model, the segmentation guided attention network model, the curriculum loss model, the deep learning video analytics model, the classifier model or the deep network model; and
performing, by the device, one or more actions based on the layer-wise explanation and the text-based explanation and/or the failure prediction.
|