US 11,922,728 B1
Associating events with actors using digital imagery and machine learning
Jaechul Kim, Seattle, WA (US); Nishitkumar Ashokkumar Desai, Redmond, WA (US); Jayakrishnan Kumar Eledath, Princeton Junction, NJ (US); Kartik Muktinutalapati, Redmond, WA (US); Shaonan Zhang, Redmond, WA (US); Hoi Cheung Pang, Bellevue, WA (US); Dilip Kumar, Seattle, WA (US); Kushagra Srivastava, Issaquah, WA (US); Gerard Guy Medioni, Los Angeles, CA (US); and Daniel Bibireata, Bellevue, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Oct. 24, 2022, as Appl. No. 18/049,252.
Application 18/049,252 is a continuation of application No. 16/799,502, filed on Feb. 24, 2020, granted, now 11,482,045.
Application 16/799,502 is a continuation in part of application No. 16/712,914, filed on Dec. 12, 2019, granted, now 11,468,698, issued on Oct. 11, 2022.
Application 16/712,914 is a continuation in part of application No. 16/022,221, filed on Jun. 28, 2018, granted, now 11,468,681, issued on Oct. 11, 2022.
Int. Cl. G06V 40/20 (2022.01); G06F 17/16 (2006.01); G06F 18/2321 (2023.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01); G06Q 30/0201 (2023.01); G06V 20/10 (2022.01); G06V 20/52 (2022.01)
CPC G06V 40/20 (2022.01) [G06F 17/16 (2013.01); G06F 18/2321 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06Q 30/0201 (2013.01); G06V 20/10 (2022.01); G06V 20/52 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a first camera comprising a first processor unit and a first optical sensor;
a second camera comprising a second processor unit and a second optical sensor;
a storage unit, wherein the first camera has a first orientation with respect to the storage unit, and wherein the second camera has a second orientation with respect to the storage unit; and
a server in communication with at least the first camera and the second camera, wherein the server is programmed with one or more sets of instructions that, when executed by the server, cause the server to at least:
determine an event involving at least one item has occurred at a shelf of the storage unit during a duration;
determine that the shelf of the storage unit is within a first field of view of the first camera and a second field of view of the second camera;
receive a first plurality of records from the first camera, wherein each of the first plurality of records is associated with one of a first plurality of images captured using the first camera and comprises:
locations of at least one body part depicted within the first plurality of images; and
information associating portions of each of the first plurality of images with the locations of the at least one body part;
receive a second plurality of records from the second camera, wherein each of the second plurality of records is associated with one of a second plurality of images captured using the first camera and comprises:
locations of at least one body part depicted within the second plurality of images captured using the second camera; and
information associating portions of each of the second plurality of images with the locations of the at least one body part;
determine a first regression vector from a location of a first body part of a first actor depicted within a first image of the first plurality of images to a portion of the first image depicting the shelf within the first image based at least in part on a first record of the first plurality of records;
determine a second regression vector from a location of the first body part depicted within a second image of the second plurality of images to a portion of the second image depicting the shelf within the second image based at least in part on a second record of the second plurality of records;
determine a first factor for the first camera based at least in part on confidence scores associated with the information associating the portions of each of the first plurality of images with the locations of the at least one body part;
determine a second factor for the second camera based at least in part on confidence scores associated with the information associating the portions of each of the second plurality of images with the locations of the at least one body part;
determine that at least one of the first body part or the shelf is occluded within the second field of view during at least a portion of the duration based at least in part on the second factor; and
in response to determining that the at least one of the first body part or the shelf is occluded within the second field of view during at least the portion of the duration,
determine that the first actor is associated with the event based at least in part on the first regression vector; and
associate one of the event or the at least one item with the first actor.