US 12,450,904 B2
	User-customized computer vision event detection
Dmytro Likhomanov, Kyivska oblast (UA); Ashwini Vijaykumar Karappa, Sunnyvale, CA (US); Apurva Mohan Paralkar, Fremont, CA (US); Oleksandr Onbysh, London (GB); Alexander Lazarev, Seattle, WA (US); Pylyp Kofman, Amsterdam (NL); Yevhen Diachenko, Zaporizhzhya (UA); Kostiantyn Shysh, Pidhorodne (UA); Denys Drabchuck, Utrecht (NL); Serhii Kupriienko, Kyiv (UA); Mykola Zekter, Kyiv (UA); Vasyl Zacheshyhryva, Haarlem (NL); Bohdan Bobyl, Kyiv (UA); Kateryna Voroniuk, Seattle, WA (US); and Andrii Kozak, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 30, 2022, as Appl. No. 17/855,337.
Prior Publication US 2024/0005661 A1, Jan. 4, 2024
Int. Cl. G06V 20/40 (2022.01); G06V 10/774 (2022.01); G06V 20/52 (2022.01)

CPC G06V 20/44 (2022.01) [G06V 10/774 (2022.01); G06V 20/52 (2022.01)]

19 Claims

1. A computer-implemented method comprising:

receiving, from a user device associated with a user, first label data indicating a first image and a first user-defined state, the first image having been captured by a first camera device associated with the user;

receiving, from the user device, second label data indicating a second image associated with a second user-defined state, the second image having been captured by the first camera device associated with the user;

for each respective image of a first set of images comprising the first image and the second image, generating, using a first pre-trained encoder model executed by the first camera device, respective embedding data; and

training, by the first camera device, a second machine learning model to classify images as corresponding to the first user-defined state or the second user-defined state based at least in part on the first label data, the second label data, and the embedding data for the images of the first set of images,

wherein the first label data is associated with a second set of images comprising the first image,

wherein the first set of images includes the second set of images,

wherein the second label data is associated with a third set of images comprising the second image,

wherein the first set of images includes the third set of images, and

wherein the computer-implemented method comprises, prior to generating the respective embedding data for each respective image of the first set of images using the first pre-trained encoder model:

selecting a first plurality of images from the second set of images and the third set of images;

training a third machine learning model to classify images as corresponding to the first user-defined state or the second user-defined state based at least in part on the first label data, the second label data, and the first plurality of images;

determining, using the third machine learning model, a first state for a third image from the first set of images that is not part of the first plurality of images; and

comparing, based on the first label data, the first state to a state associated with the third image.