US 12,361,750 B2
	Synthetic emotion in continuously generated voice-to-video system
Seth Jacob Rothschild, Littleton, MA (US); and Alex Robbins, Cambridge, MA (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Aug. 27, 2021, as Appl. No. 17/446,193.
Prior Publication US 2023/0061761 A1, Mar. 2, 2023
Int. Cl. G06V 40/16 (2022.01); G06V 20/40 (2022.01); G10L 25/63 (2013.01); G11B 27/034 (2006.01); G11B 27/036 (2006.01); G11B 27/34 (2006.01)

CPC G06V 40/161 (2022.01) [G06V 20/46 (2022.01); G06V 40/176 (2022.01); G10L 25/63 (2013.01); G11B 27/034 (2013.01); G11B 27/036 (2013.01); G11B 27/34 (2013.01)]

18 Claims

1. A method, comprising:

collecting an audio segment that comprises audio data generated by a user;

analyzing the audio data to identify an emotion expressed by the user;

computing start and end indices for frames of a video segment, which includes a representation of a face of the user, and the representation comprises facial features;

selecting video data that shows the emotion expressed by the user from among a plurality of video data, of which each has been prerecorded by the user expressing a respective emotion;

using the video data and the start and end indices for the frames of the video segment to modify the representation appearing in the video segment so as to generate modified faces in the selected video data based on the audio segment so that the modified faces in the selected video data appear to be speaking words in the audio segment; and

stitching the modified faces from the selected video data over faces in the video segment, thereby swapping the faces in the video segment with the modified faces in the selected video data, to create a modified video segment with the emotion expressed by the user,

wherein the modified video segment includes the audio data generated by the user.