US 12,033,611 B2
Generating expressive speech audio from text data
Siddharth Gururani, Santa Clara, CA (US); Kilol Gupta, Redwood City, CA (US); Dhaval Shah, Redwood City, CA (US); Zahra Shakeri, Newark, CA (US); Jervis Pinto, Toronto (CA); Mohsen Sardari, Burlingame, CA (US); Navid Aghdaie, San Jose, CA (US); and Kazi Zaman, Foster City, CA (US)
Assigned to ELECTRONIC ARTS INC., Redwood City, CA (US)
Filed by Electronic Arts Inc., Redwood City, CA (US)
Filed on Feb. 28, 2022, as Appl. No. 17/682,206.
Application 17/682,206 is a continuation of application No. 16/840,070, filed on Apr. 3, 2020, granted, now 11,295,721.
Claims priority of provisional application 62/936,249, filed on Nov. 15, 2019.
Prior Publication US 2022/0208170 A1, Jun. 30, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 13/00 (2006.01); A63F 13/60 (2014.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01); A63F 13/63 (2014.01)
CPC G10L 13/00 (2013.01) [A63F 13/60 (2014.09); G06N 3/044 (2023.01); G06N 3/08 (2013.01); A63F 13/63 (2014.09); A63F 2300/6018 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for use in video game development for generating expressive speech audio, the system comprising:
a user interface configured to receive user-input text data and a user selection of a speech style; and
a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder, the machine-learned synthesizer being configured to:
generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer;
generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer;
combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and
decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted spectrogram parameters for the expressive speech audio.