US 12,148,417 B1
Label confidence scoring
Aidan Thomas Cardella, Wellesley, MA (US); Anand Victor, Bellevue, WA (US); Vipin Gupta, Issaquah, WA (US); Zheng Du, Bellevue, WA (US); John Rajiv Malik, Waltham, MA (US); Li Erran Li, Palo Alto, CA (US); Jarrett Alegre Bato, Seattle, WA (US); Peng Yang, Sammamish, WA (US); and Alejandro Ricardo Mottini D'Oliveira, Seattle, WA (US)
Assigned to AMAZON TECHNOLOGIES, INC., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 22, 2021, as Appl. No. 17/354,215.
Int. Cl. G10L 15/01 (2013.01); G10L 15/16 (2006.01); G10L 15/18 (2013.01); G10L 15/26 (2006.01)
CPC G10L 15/01 (2013.01) [G10L 15/16 (2013.01); G10L 15/18 (2013.01); G10L 15/26 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving first data representing first human speech;
receiving, by a first computing device from a first user, first label data comprising a transcription of at least a portion of the first human speech, wherein the first user is a human user;
generating first feature data representing at least one of a time when or a location where a second computing device received the first human speech using at least one microphone of the second computing device, wherein the second computing device is a speech processing-enabled device;
generating second feature data representing past transcription performance of the first user, wherein the past transcription performance comprises a historical accuracy of transcriptions generated by the first user;
generating third feature data representing actions taken by the first user while generating the transcription, wherein the third feature data comprises data indicating that the first user executed an internet search during transcription;
determining, by a first machine learning model using the first feature data, the second feature data, and the third feature data, a first confidence score representing a confidence in an accuracy of the first label data;
generating an updated automatic speech recognition (ASR) model of a natural language processing system using the first data, the first label data, and the first confidence score; and
generating an updated first machine learning model based on a difference between the first confidence score and at least one other confidence score determined for the first label data.