CPC G10L 15/22 (2013.01) [B60W 40/08 (2013.01); G06F 18/251 (2023.01); G06N 3/08 (2013.01); G06T 7/20 (2013.01); G10L 15/16 (2013.01); G10L 15/25 (2013.01); B60W 2040/089 (2013.01); B60W 2420/403 (2013.01); B60W 2420/54 (2013.01); B60W 2540/21 (2020.02); B60W 2540/221 (2020.02); G06T 2207/30201 (2013.01); G10L 2015/223 (2013.01)] | 20 Claims |
1. A speech recognition method, comprising:
receiving audible data, wherein the audible data includes information about an utterance by a user;
receiving user data, wherein the user data includes information about movements by the user, wherein the user data includes hand gestures of the user, and the user data includes an image of a lip of the user;
fusing the audible data and the user data to obtain fused data; and
determining at least one spoken word of the utterance based on the fused data;
wherein determining the at least one spoken word of the utterance includes:
using a trained neural network to determine the at least one spoken word of the utterance;
fine-tuning the trained neural network based on the fused data to adapt to speech patterns of the user, wherein fine-tuning the trained neural network includes:
identifying a particular lip movement when the user utters a particular word, wherein the particular word is “navigation”; and
associating the particular lip movement with the particular word uttered by the user.
|