US 12,462,561 B2
	Relationship modeling and key feature detection based on video data
Michael Griffin, Wayland, MA (US)
Assigned to Insight Direct USA, Inc., Chandler, AZ (US)
Filed by Insight Direct USA, Inc., Tempe, AZ (US)
Filed on Sep. 23, 2022, as Appl. No. 17/952,002.
Claims priority of provisional application 63/405,721, filed on Sep. 12, 2022.
Claims priority of provisional application 63/405,716, filed on Sep. 12, 2022.
Claims priority of provisional application 63/405,722, filed on Sep. 12, 2022.
Claims priority of provisional application 63/405,719, filed on Sep. 12, 2022.
Claims priority of provisional application 63/286,844, filed on Dec. 7, 2021.
Prior Publication US 2023/0177835 A1, Jun. 8, 2023
Int. Cl. G06V 20/40 (2022.01); G06F 18/22 (2023.01); G06F 40/30 (2020.01); G06V 10/426 (2022.01); G06V 10/44 (2022.01); G06V 10/70 (2022.01); G06V 10/74 (2022.01); G06V 10/774 (2022.01); G06V 10/86 (2022.01); G06V 20/70 (2022.01); G06V 30/262 (2022.01); G06V 40/20 (2022.01); G10L 25/57 (2013.01)

CPC G06V 20/41 (2022.01) [G06F 18/22 (2023.01); G06F 40/30 (2020.01); G06V 10/426 (2022.01); G06V 10/44 (2022.01); G06V 10/70 (2022.01); G06V 10/761 (2022.01); G06V 10/774 (2022.01); G06V 10/7753 (2022.01); G06V 10/86 (2022.01); G06V 20/46 (2022.01); G06V 20/70 (2022.01); G06V 30/274 (2022.01); G06V 40/20 (2022.01); G10L 25/57 (2013.01)]

18 Claims

1. A method comprising:

acquiring training video data that portrays a plurality of training interacting events;

labeling each training interacting event of the plurality of training interacting events as positive or negative to create a plurality of positive interacting events and a plurality of negative interacting events;

creating a plurality of positive training relationship graphs by, for each positive interacting event of the plurality of positive interacting events:

extracting positive training image data, positive training audio data, and positive training semantic text data from the training video data;

analyzing, by a first computer-implemented machine learning model, at least one of the positive training image data, the positive training audio data, and the positive training semantic text data to identify a plurality of positive training video features; and

analyzing the plurality of positive training video features to create a positive relationship graph, wherein the positive relationship graph includes a plurality of positive training nodes and a plurality of positive training edges extending between nodes of the plurality of positive training nodes;

creating a plurality of negative training relationship graphs by, for each negative interacting event of the plurality of negative interacting events:

extracting negative training image data, negative training audio data, and negative semantic text data from the training video data;

analyzing, by the first computer-implemented machine learning model, at least one of the negative training image data, the negative training audio data, and the negative semantic text data to identify a plurality of negative training video features;

analyzing the plurality of negative training video features to create a negative relationship graph, wherein the negative relationship graph includes a plurality of negative training nodes and a plurality of negative training edges extending between nodes of the plurality of negative training nodes;

analyzing the plurality of positive training relationship graphs to identify a plurality of positive graph features;

analyzing the plurality of negative training relationship graphs to identify a plurality of negative graph features;

training a second computer-implemented machine learning model to identify positive and negative interacting events using the plurality of positive graph features and the plurality of negative graph features;

identifying a first key feature using the trained second computer-implemented machine learning model;

acquiring digital video data that portrays an interacting event, the interacting event comprising a plurality of interactions between a plurality of individuals;

extracting image data, audio data, and semantic text data from the video data;

analyzing, by the first computer-implemented machine learning model, at least one of the image data, the audio data, and the semantic text data to identify a plurality of video features;

analyzing the plurality of video features to create a relationship graph, wherein:

the relationship graph comprises a plurality of nodes and a plurality of edges;

each node of the plurality of nodes represents an individual of the plurality of individuals;

each edge of the plurality of edges extends between two nodes of the plurality of nodes; and

the plurality of edges represents the plurality of interactions;

determining whether the first key feature is present in the relationship graph, wherein presence of the first key feature is predictive of a positive outcome of the interacting event; and

outputting, by a user interface, an indication whether the first key feature is present in the relationship graph.