US 12,136,421 B2
	System and method for integrating auditory and non-auditory inputs for adaptable speech recognition
Jacob Alan Bond, Rochester Hills, MI (US); Hannah Elizabeth Wagner, Berkley, MI (US); Joseph F. Szczerba, Grand Blanc, MI (US); and Alan D. Hejl, Ferndale, MI (US)
Assigned to GM GLOBAL TECHNOLOGY OPERATIONS LLC, Detroit, MI (US)
Filed by GM Global Technology Operations LLC, Detroit, MI (US)
Filed on Mar. 3, 2022, as Appl. No. 17/685,716.
Prior Publication US 2023/0282210 A1, Sep. 7, 2023
Int. Cl. G10L 15/22 (2006.01); B60W 40/08 (2012.01); G06F 18/25 (2023.01); G06N 3/08 (2023.01); G06T 7/20 (2017.01); G10L 15/16 (2006.01); G10L 15/25 (2013.01)

CPC G10L 15/22 (2013.01) [B60W 40/08 (2013.01); G06F 18/251 (2023.01); G06N 3/08 (2013.01); G06T 7/20 (2013.01); G10L 15/16 (2013.01); G10L 15/25 (2013.01); B60W 2040/089 (2013.01); B60W 2420/403 (2013.01); B60W 2420/54 (2013.01); B60W 2540/21 (2020.02); B60W 2540/221 (2020.02); G06T 2207/30201 (2013.01); G10L 2015/223 (2013.01)]

20 Claims

1. A speech recognition method, comprising:

receiving audible data, wherein the audible data includes information about an utterance by a user;

receiving user data, wherein the user data includes information about movements by the user, wherein the user data includes hand gestures of the user, and the user data includes an image of a lip of the user;

fusing the audible data and the user data to obtain fused data; and

determining at least one spoken word of the utterance based on the fused data;

wherein determining the at least one spoken word of the utterance includes:

using a trained neural network to determine the at least one spoken word of the utterance;

fine-tuning the trained neural network based on the fused data to adapt to speech patterns of the user, wherein fine-tuning the trained neural network includes:

identifying a particular lip movement when the user utters a particular word, wherein the particular word is “navigation”; and

associating the particular lip movement with the particular word uttered by the user.