US 11,783,613 B1
	Recognizing and tracking poses using digital imagery captured from multiple fields of view
Jean Laurent Guigues, Seattle, WA (US); and Leonid Pishchulin, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 16, 2020, as Appl. No. 17/98,878.
Application 17/098,878 is a continuation of application No. 15/391,821, filed on Dec. 27, 2016, granted, now 10,839,203.
Int. Cl. G06V 40/10 (2022.01); G06T 7/77 (2017.01); H04N 7/18 (2006.01); G06V 10/46 (2022.01); G06F 18/2113 (2023.01); G06F 18/2415 (2023.01); G06T 7/292 (2017.01); G06V 20/52 (2022.01); G06F 18/214 (2023.01)

CPC G06V 40/10 (2022.01) [G06F 18/214 (2023.01); G06F 18/2113 (2023.01); G06F 18/2415 (2023.01); G06T 7/292 (2017.01); G06T 7/77 (2017.01); G06V 10/469 (2022.01); G06V 20/52 (2022.01); H04N 7/181 (2013.01); G06T 2207/20044 (2013.01); G06T 2207/30232 (2013.01); G06T 2207/30241 (2013.01)]

20 Claims

1. A system for tracking multi-joint subjects in an area of real space, comprising:

a plurality of cameras, cameras in the plurality of cameras producing respective sequences of images of corresponding fields of view in the real space, the field of view of each camera overlapping with the field of view of at least one other camera in the plurality of cameras;

a processing system coupled to the plurality of cameras, the processing system comprising:

at least a first component configured to receive the sequences of images from the plurality of cameras, wherein the first component is further configured to process images to generate corresponding arrays of joint data structures, the arrays of joint data structures corresponding to particular images classifying elements of the particular images by joint type, time of the particular image, and coordinates of the element in the particular image;

at least a second component configured to receive the arrays of joint data structures corresponding to images in sequences of images from cameras having overlapping fields of view, wherein the second component is further configured to translate the coordinates of the elements in the arrays of joint data structures corresponding to images in different sequences into candidate joints having coordinates in real space; and

at least a third component configured to identify sets of candidate joints having coordinates in real space as multi-joint subjects in the real space.