US 12,299,929 B2
	Multi-view multi-target action recognition
Wanxin Xu, San Jose, CA (US); and Ko-Kai Albert Huang, Cupertino, CA (US)
Assigned to SONY GROUP CORPORATION, Tokyo (JP); and SONY CORPORATION OF AMERICA, New York, NY (US)
Filed by Sony Group Corporation, Tokyo (JP); and Sony Corporation of America, New York, NY (US)
Filed on Dec. 22, 2021, as Appl. No. 17/559,751.
Claims priority of provisional application 63/260,108, filed on Aug. 10, 2021.
Prior Publication US 2023/0050992 A1, Feb. 16, 2023
Int. Cl. G06T 7/00 (2017.01); G06T 7/292 (2017.01); G06T 7/73 (2017.01); G06V 40/20 (2022.01)

CPC G06T 7/75 (2017.01) [G06T 7/292 (2017.01); G06V 40/23 (2022.01)]

14 Claims

1. A system comprising:

one or more processors; and

logic encoded in one or more non-transitory computer-readable storage media for execution by the one or more processors and when executed operable to cause the one or more processors to perform operations comprising:

obtaining a plurality of videos of a plurality of subjects in an environment, wherein at least one target subject of the plurality of subjects performs one or more actions in the environment;

tracking the at least one target subject across at least two cameras;

determining pose information associated with the at least one target subject, wherein the determining of pose information is based on triangulation;

reconstructing a 3-dimensional (3D) model of the at least one target subject based on the plurality of videos, the tracking of the at least one target subject, and the pose information;

determining back-projected pose information from the 3D model;

converting the back-projected pose information to 2D space information; and

recognizing the one or more actions of the at least one target subject based on the reconstructing of the 3D model and the 2D space information.