US 11,984,128 B2
	Automatic generation and/or use of text-dependent speaker verification features
Matthew Sharifi, Kilchberg (CH); and Victor Carbune, Zurich (CH)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Mar. 21, 2022, as Appl. No. 17/700,135.
Application 17/700,135 is a continuation of application No. 17/069,565, filed on Oct. 13, 2020, granted, now 11,315,575.
Prior Publication US 2022/0215845 A1, Jul. 7, 2022
Int. Cl. G10L 17/08 (2013.01); G06F 3/16 (2006.01); G06F 21/32 (2013.01); G10L 15/22 (2006.01); G10L 17/02 (2013.01); G10L 17/04 (2013.01); G10L 17/10 (2013.01); G10L 17/14 (2013.01); G10L 17/18 (2013.01); G10L 17/22 (2013.01); G10L 17/24 (2013.01); G10L 15/08 (2006.01)

CPC G10L 17/08 (2013.01) [G06F 3/167 (2013.01); G06F 21/32 (2013.01); G10L 15/22 (2013.01); G10L 17/02 (2013.01); G10L 17/04 (2013.01); G10L 17/10 (2013.01); G10L 17/14 (2013.01); G10L 17/18 (2013.01); G10L 17/22 (2013.01); G10L 17/24 (2013.01); G10L 2015/088 (2013.01); G10L 2015/227 (2013.01)]

20 Claims

1. A method implemented by one or more processors, the method comprising:

receiving audio data that captures a spoken utterance of a given user, the audio data being detected via one or more microphones of an assistant device of the given user;

processing a first portion of the audio data to generate first utterance features;

performing a first comparison of the first utterance features to first speaker features for a first text dependent speaker verification (TD-SV) for the user, the first TD-SV being dependent on a first set of one or more terms;

processing a second portion of the audio data to generate second utterance features, the second portion of the audio data being distinct from the first portion of the audio data;

performing a second comparison of the second utterance features to second speaker features for a second TD-SV for the user, the second TD-SV being dependent on a second set of one or more terms that are distinct from the first set of one or more terms;

determining, based on both the first comparison and the second comparison, to authenticate the user for the spoken utterance; and

in response to determining to authenticate the user for the spoken utterance:

performing one or more actions, that are based on the spoken utterance.