US 12,315,490 B2
	Text-to-speech and speech recognition for noisy environments
Daniel Bromand, Boston, MA (US); Björn Erik Roth, Stockholm (SE); and Kåre Sjölander, Stockholm (SE)
Assigned to Spotify AB, Stockholm (SE)
Filed by Spotify AB, Stockholm (SE)
Filed on Dec. 30, 2021, as Appl. No. 17/565,826.
Claims priority of provisional application 63/133,101, filed on Dec. 31, 2020.
Prior Publication US 2022/0208174 A1, Jun. 30, 2022
Int. Cl. G10L 13/033 (2013.01); G10L 13/08 (2013.01); G10L 15/08 (2006.01); G10L 15/22 (2006.01); G10L 25/84 (2013.01)

CPC G10L 13/033 (2013.01) [G10L 13/08 (2013.01); G10L 15/08 (2013.01); G10L 15/22 (2013.01); G10L 25/84 (2013.01); G10L 2015/088 (2013.01)]

20 Claims

1. A method comprising:

providing, by a media delivery system, a first audio representation of a first simulated sound environment;

receiving, by the media delivery system, first speech from a user speaking subject to audio playout of in the first simulated sound environment;

providing, by the media delivery system, a second audio representation of a second simulated sound environment, wherein the second simulated sound environment has different acoustic characteristics than the first simulated sound environment;

receiving, by the media delivery system, second speech from the user speaking subject to audio playout of the second simulated sound environment;

determining a change in a speech component between the first speech and the second speech; and

based on the change in the speech component, creating a transform to adjust the speech component.