US 12,094,046 B1
Digital human driving method and apparatus, and storage medium
Huapeng Sima, Nanjing (CN); Jintai Luan, Nanjing (CN); Hongwei Fan, Nanjing (CN); Jiabin Li, Nanjing (CN); Hao Jiang, Nanjing (CN); and Qixun Qu, Nanjing (CN)
Assigned to NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD., Jiangsu (CN)
Filed by NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD., Jiangsu (CN)
Filed on Jan. 23, 2024, as Appl. No. 18/419,759.
Claims priority of application No. 202310847425.5 (CN), filed on Jul. 12, 2023.
Int. Cl. G06T 19/20 (2011.01); G06T 7/20 (2017.01); G06T 7/80 (2017.01); G06T 13/40 (2011.01); G06T 19/00 (2011.01)
CPC G06T 13/40 (2013.01) [G06T 7/80 (2017.01); G06T 19/00 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/30196 (2013.01)] 7 Claims
OG exemplary drawing
 
1. A digital human driving method, comprising the steps of:
capturing video data from a plurality of angles of view in a real three-dimensional space by a plurality of video capture devices, wherein the video data comprises a target human;
determining a first coordinate of a key point of the target human, wherein the first coordinate is a two-dimensional coordinate of the key point in a video frame of the video data;
determining a mapping relationship based on the first coordinate, wherein the mapping relationship is a correspondence between the key point and a virtual key point in a virtual three-dimensional space;
calculating a second coordinate based on the mapping relationship and the first coordinate, wherein the second coordinate is a three-dimensional coordinate of the virtual key point in the virtual three-dimensional space;
processing the second coordinate according to a key point rotation model to obtain a rotation value of the virtual key point in the virtual three-dimensional space; and
driving a digital human to move based on the rotation value of the virtual key point in the virtual three-dimensional space,
wherein determining the mapping relationship based on the first coordinate comprises: determining a first parameter matrix based on the first coordinate of the key point, wherein the first parameter matrix indicates a transformation relationship of transforming a position point in the real three-dimensional space from a first coordinate system to a second coordinate system, the first coordinate system is a three-dimensional coordinate system in the real three-dimensional space with a center of the target human as an origin, and the second coordinate system is a three-dimensional coordinate system in the real three-dimensional space with an optical center of the video capture device as an origin; obtaining a second parameter matrix, wherein the second parameter matrix indicates a transformation relationship of transforming a position point in the real three-dimensional space from the second coordinate system to a third coordinate system, and the third coordinate system is a coordinate system with an image center of video frame of the video data as an origin; and determining a matrix of a product of the first parameter matrix and the second parameter matrix as the mapping relationship, and
wherein determining the first parameter matrix based on the first coordinate of the key point comprises: determining an initial first parameter matrix based on the first coordinate of the key point; obtaining a third coordinate of the key point, wherein the third coordinate is a three-dimensional coordinate of the key point in the real three-dimensional space; calculating a first predicted coordinate value based on the third coordinate and the initial first parameter matrix; determining a first loss value based on the first predicted coordinate value and the first coordinate; and iteratively updating the initial first parameter matrix based on the first loss value until the first loss value is less than a first preset threshold, and determining a newly updated initial first parameter matrix as the first parameter matrix.