US 11,961,331 B1
	Systems for improving pose determination based on video data
Ido Yerushalmy, Tel-Aviv (IL); Michael Chertok, Raanana (IL); and Sharon Alpert, Rehovot (IL)
Assigned to AMAZON TECHNOLOGIES, INC., Seattle, WA (US)
Filed by AMAZON TECHNOLOGIES, INC., Seattle, WA (US)
Filed on Aug. 30, 2021, as Appl. No. 17/446,390.
Int. Cl. G06V 40/20 (2022.01); G06N 20/00 (2019.01); G06T 7/73 (2017.01); G06V 10/98 (2022.01); G06V 20/40 (2022.01)

CPC G06V 40/23 (2022.01) [G06N 20/00 (2019.01); G06T 7/73 (2017.01); G06V 10/993 (2022.01); G06V 20/46 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01)]

20 Claims

1. A system comprising:

a first computing device comprising:

one or more first memories storing first computer-executable instructions; and

one or more first hardware processors to execute the first computer-executable instructions to:

acquire video data representing a user performing an activity;

use a first pose extraction algorithm to determine first pose data representing a first pose of the user within a frame of the video data, wherein the first pose data includes a first plurality of points, and each point of the first plurality of points represents a location of a respective body part of the user;

use a second pose extraction algorithm to determine second pose data representing a second pose of the user within the frame of the video data, wherein the second pose data includes a second plurality of points, and each point of the second plurality of points represents a location of a respective body part of the user;

determine that a first location of a first point of the first plurality of points differs from a second location of a second point of the second plurality of points by at least a threshold distance;

in response to the first location differing from the second location by at least the threshold distance, present a first output that indicates the frame of the video data and requests authorization to send the frame to a second computing device;

receive input data indicating the authorization to provide the frame to the second computing device;

in response to the input data, send the frame to the second computing device;

receive, from the second computing device, third pose data representing a third pose of the user within the frame of the video data, wherein the third pose data includes a third plurality of points, and each point of the third plurality of points represents a location of a respective body part of the user; and

present a second output based on the third pose data.