US 12,105,755 B1
	Automated content filtering using image retrieval models
Mohamed Kamal Omar, Seattle, WA (US); Xiaohang Sun, Bellevue, WA (US); Han-Kai Hsu, Seattle, WA (US); Ashutosh Sanan, Seattle, WA (US); and Wentao Zhu, Redmond, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 28, 2022, as Appl. No. 17/852,063.
Int. Cl. G06F 16/71 (2019.01); G06N 20/00 (2019.01)

CPC G06F 16/71 (2019.01) [G06N 20/00 (2019.01)]

20 Claims

1. A method for determining a training dataset for a machine learning model comprising:

receiving a user input associated with a label to filter one or more videos from a database storing a plurality of videos;

segmenting, using a boundary detection algorithm, the plurality of videos into a plurality of segments representing a plurality of video clips of the plurality of videos;

determining one or more representative frames from the plurality of segments, the one or more representative frames comprising one or more images useable in place of the plurality of segments;

determining, using an image classification model configured to detect pre-defined categories of objects within the one or more representative frames, one or more image classification labels for the one or more representative frames that are associated with particular content of interest as identified by the label;

determining, using the image classification model and based on the one or more image classification labels, a plurality of image classification scores associated with probabilities that the one or more representative frames are associated with the label;

determining a first ranking of the one or more representative frames based on the plurality of image classification scores;

selecting a first subset of the plurality of videos based on the first ranking;

determining, using a media retrieval model comprising a media retrieval encoder and a similarity calculator, a first embedding based on query images and query text related to the label;

determining, using the media retrieval model, a second embedding based on the one or more representative frames of the plurality of videos;

determining similarity embedding scores between the label and the plurality of videos by calculating an inner product of the first embedding and the second embedding;

determining a second ranking of the plurality of videos based on the similarity embedding scores;

selecting a second subset of the plurality of videos based on the second ranking;

selecting a first validation set of videos by selecting from the first subset of the plurality of videos based on the first ranking;

selecting a second validation set of videos by selecting from the second subset of the plurality of videos based on the second ranking;

receiving validation results for the first validation set and the second validation set describing an accuracy of the first validation set and the second validation set with the label; and

determining a video dataset associated with the label from the first subset or the second subset based on the validation results.