US 12,374,328 B2
Dynamic domain-adapted automatic speech recognition system
Atul Kumar, San Jose, CA (US); Elizabeth O. Bratt, Mountain View, CA (US); Minsuk Heo, Santa Clara, CA (US); Nidhi Rajshree, Los Gatos, CA (US); and Praful Chandra Mangalath, Sunnyvale, CA (US)
Assigned to Roku, Inc., San Jose, CA (US)
Filed by Roku, Inc., San Jose, CA (US)
Filed on Nov. 16, 2023, as Appl. No. 18/511,077.
Application 18/511,077 is a continuation of application No. 17/214,462, filed on Mar. 26, 2021, granted, now 11,862,152.
Prior Publication US 2024/0096315 A1, Mar. 21, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/187 (2013.01); G06F 40/205 (2020.01); G06F 40/295 (2020.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 25/33 (2013.01)
CPC G10L 15/187 (2013.01) [G06F 40/205 (2020.01); G06F 40/295 (2020.01); G10L 15/1822 (2013.01); G10L 15/22 (2013.01); G10L 25/33 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for adapting an automatic speech recognition engine implemented within a multimedia environment, comprising:
receiving, at a user device, a voice query that includes identification of requested media content and an action to be performed on the requested media content, wherein the identification of the requested media content comprises a first entity representing a title of the requested media content and a second entity representing at least one metadata associated with the requested media content;
generating a transcription of the voice query, wherein the transcription is generated using the automatic speech recognition engine, wherein the transcription includes an imperfect textual representation of the requested media content;
parsing the transcription to identify the first entity and the second entity;
generating a phonetic representation of the requested media content based on the first entity and the second entity;
generating, based on the phonetic representation, a fuzzy candidate list comprising a plurality of fuzzy candidates representing potential matches to the requested media content, wherein each fuzzy candidate of the plurality of fuzzy candidates is associated with a popularity score;
ranking the fuzzy candidate list to form a ranked fuzzy candidate list including a highest ranked fuzzy candidate corresponding to a best potential match for the requested media content, wherein the highest ranked fuzzy candidate corresponding to the best potential match for the requested media content is determined based on a comparison of the popularity score of each fuzzy candidate of the plurality of fuzzy candidates; and
performing the action associated with the highest ranked fuzzy candidate.