US 12,260,649 B2
Determining incorrect predictions by, and generating explanations for, machine learning models
Subramaniaprabhu Jagadeesan, Chennai (IN); and Bikash Chandra Mahato, Bangalore (IN)
Assigned to Accenture Global Solutions Limited, Dublin (IE)
Filed by Accenture Global Solutions Limited, Dublin (IE)
Filed on Nov. 4, 2022, as Appl. No. 17/980,898.
Prior Publication US 2024/0153275 A1, May 9, 2024
Int. Cl. G06V 20/54 (2022.01); G06V 10/75 (2022.01); G06V 10/764 (2022.01); G06V 10/766 (2022.01); G06V 10/77 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 20/52 (2022.01); G06V 40/16 (2022.01)
CPC G06V 20/54 (2022.01) [G06V 10/751 (2022.01); G06V 10/764 (2022.01); G06V 10/766 (2022.01); G06V 10/7715 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/44 (2022.01); G06V 20/53 (2022.01); G06V 40/172 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving, by a device, surveillance video data captured by a surveillance system in a geographic location;
processing, by the device, the surveillance video data, with a deep learning model, to identify objects in the geographic location;
utilizing, by the device, a segmentation guided attention network model with the objects to determine traffic density count data in the geographic location;
processing, by the device, the segmented video frame data, with a regression analysis model, to derive traffic signal timing in the geographic location;
utilizing, by the device, a curriculum loss model with the objects to determine crowd count data in the geographic location;
processing, by the device, the surveillance video data, with a deep learning video analytics model, to identify first events associated with emergency vehicles and accidents in the geographic location;
processing, by the device, the surveillance video data, with a classifier model and a deep network model, to identify second events associated with facial recognition in the geographical location;
processing, by the device, one or more of the objects, the traffic density count data, the crowd count data, the first events, or the second events, with a dynamic text-based explanation model, to generate a layer-wise explanation and a text-based explanation and/or a failure prediction for one or more of the regression analysis model, the deep learning model, the segmentation guided attention network model, the curriculum loss model, the deep learning video analytics model, the classifier model or the deep network model; and
performing, by the device, one or more actions based on the layer-wise explanation and the text-based explanation and/or the failure prediction.