US 12,230,279 B1
	User authentication for voice-input devices
Preethi Parasseri Narayanan, Cupertino, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Aug. 6, 2021, as Appl. No. 17/396,535.
Application 17/396,535 is a continuation of application No. 15/861,573, filed on Jan. 3, 2018, granted, now 11,087,769.
Application 15/861,573 is a continuation of application No. 15/068,967, filed on Mar. 14, 2016, granted, now 9,865,268, issued on Jan. 9, 2018.
Application 15/068,967 is a continuation of application No. 13/624,633, filed on Sep. 21, 2012, granted, now 9,286,899, issued on Mar. 15, 2016.
Int. Cl. G10L 17/24 (2013.01); G06F 21/32 (2013.01); G10L 15/22 (2006.01); G10L 17/06 (2013.01); G10L 17/22 (2013.01)

CPC G10L 17/24 (2013.01) [G06F 21/32 (2013.01); G10L 15/22 (2013.01); G10L 17/06 (2013.01); G10L 17/22 (2013.01)]

20 Claims

1. A system comprising:

one or more processors; and

memory storing one or more computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:

receiving, via one or more microphones of a voice-controlled device that includes functionality for processing speech captured by the one or more microphones, a first audio signal representing a first utterance spoken by a user;

determining an identity of the user based at least in part on a comparison of one or more first characteristics associated with the first audio signal and user data previously stored in association with the user, the one or more first characteristics including at least one of a frequency, a pitch, or a decibel level of the first audio signal;

determining, based at least in part on the one or more first characteristics, that the first utterance includes at least one first predefined word that is independent of the identity of the user;

receiving, via the one or more microphones of the voice-controlled device, a second audio signal representing a second utterance;

processing, using at least one of speech recognition or natural language processing, the second audio signal;

determining, based at least in part on processing the second audio signal, that the second utterance includes one or more first words that represent a first user request and that does not include the at least one first predefined word;

causing, via one or more speakers of the voice-controlled device, first output of first audio data that corresponds to the first user request;

receiving, via the one or more microphones of the voice-controlled device, a third audio signal representing a third utterance;

processing, using at least one of the speech recognition or the natural language processing, the third audio signal;

determining, based at least in part on processing the third audio signal, that the third utterance includes one or more second words that represent a second user request and that do not include the at least one first predefined word, at least one of the first user request or the second user request initiating a financial transaction for a product or a service to be provided to the user;

determining that one or more second characteristics associated with the at least one of the first user request or the second user request are within a threshold similarity with respect to the one or more first characteristics or one or more third characteristics previously stored in association with a user profile associated with the user;

causing, via the one or more speakers and based at least in part on the one or more second characteristics being within the threshold similarity with respect to the one or more first characteristics or the one or more third characteristics, second output of second audio data corresponding to a confirmation or a performance of the financial transaction for the product or the service;

receiving, via the one or more microphones of the voice-controlled device, a fourth audio signal representing a fourth utterance that includes at least one second predefined word that is different than the at least one first predefined word; and

automatically terminating, by the voice-controlled device and in response to the fourth utterance including the at least one second predefined word, a session associated with the user.