US 12,254,548 B1
	Listener animation
Gourav Datta, Los Angeles, CA (US); Vivek Yadav, Lakewood, CO (US); Yue Wu, Torrance, CA (US); Ayush Jaiswal, Redondo Beach, CA (US); Rajiv M Reddy, Bellevue, WA (US); Prateek Singhal, San Francisco, CA (US); Karthik Ramakrishnan, Bellevue, WA (US); and Premkumar Natarajan, Rolling Hills Estates, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 16, 2022, as Appl. No. 18/082,709.
Int. Cl. G06T 7/20 (2017.01); G06T 7/70 (2017.01); G06T 13/20 (2011.01); G06T 13/40 (2011.01); G06V 40/16 (2022.01); G10L 15/22 (2006.01); G10L 25/57 (2013.01); G10L 25/60 (2013.01)

CPC G06T 13/205 (2013.01) [G06T 7/20 (2013.01); G06T 7/70 (2017.01); G06T 13/40 (2013.01); G06V 40/176 (2022.01); G10L 15/22 (2013.01); G10L 25/57 (2013.01); G10L 25/60 (2013.01); G06T 2207/30201 (2013.01)]

19 Claims

1. A computer-implemented method, the method comprising:

receiving, from a microphone of a device, first audio data that includes a representation of first speech of a first user;

receiving, from an image sensor of the device, first image data representing a face of the first user;

generating, using the first image data, first motion data representing first facial motion of the first user corresponding to the first speech;

generating, by a machine learning transformer component using the first audio data and the first motion data, first embedding data that represents the first facial motion, wherein the first embedding data corresponds to a first coordinate system;

determining, using a first identifier representing a listener style, second embedding data corresponding to the first coordinate system;

generating, by a first machine learning model using the first embedding data and the second embedding data, first animation data corresponding to second facial motion responsive to the first speech;

generating, using the first animation data, second image data representing a synthetic face engaging in the second facial motion; and

presenting, on a display of the device, the second image data.