US 12,347,135 B2
	Generating gesture reenactment video from video motion graphs using machine learning
Yang Zhou, Sunnyvale, CA (US); Jimei Yang, Mountain View, CA (US); Jun Saito, Seattle, WA (US); Dingzeyu Li, Seattle, WA (US); and Deepali Aneja, Seattle, WA (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Nov. 14, 2022, as Appl. No. 18/055,310.
Prior Publication US 2024/0161335 A1, May 16, 2024
Int. Cl. G06K 9/00 (2022.01); G06F 16/683 (2019.01); G06F 40/242 (2020.01); G06T 7/207 (2017.01); G06T 7/73 (2017.01)

CPC G06T 7/73 (2017.01) [G06F 16/685 (2019.01); G06F 40/242 (2020.01); G06T 7/207 (2017.01)]

17 Claims

1. A method comprising:

receiving a first input including a reference speech video, the reference speech video including a reference video sequence paired with a reference audio sequence;

generating a video motion graph representing the reference speech video, wherein each node of the video motion graph is associated with a frame of the reference video sequence and reference audio features of the reference audio sequence;

receiving a second input including a target audio sequence;

identifying a node path through the video motion graph based on target audio features and the reference audio features; and

generating an output media sequence, the output media sequence including an output video sequence generated based on the identified node path through the video motion graph paired with the target audio sequence, including blending, by a trained neural network, frames of the reference video associated with one or more nodes surrounding pairs of consecutive nodes in the identified node path that are non-consecutive nodes in the video motion graph.