| CPC G10L 15/187 (2013.01) [G06F 40/205 (2020.01); G06F 40/295 (2020.01); G10L 15/1822 (2013.01); G10L 15/22 (2013.01); G10L 25/33 (2013.01)] | 20 Claims |

|
1. A computer-implemented method for adapting an automatic speech recognition engine implemented within a multimedia environment, comprising:
receiving, at a user device, a voice query that includes identification of requested media content and an action to be performed on the requested media content, wherein the identification of the requested media content comprises a first entity representing a title of the requested media content and a second entity representing at least one metadata associated with the requested media content;
generating a transcription of the voice query, wherein the transcription is generated using the automatic speech recognition engine, wherein the transcription includes an imperfect textual representation of the requested media content;
parsing the transcription to identify the first entity and the second entity;
generating a phonetic representation of the requested media content based on the first entity and the second entity;
generating, based on the phonetic representation, a fuzzy candidate list comprising a plurality of fuzzy candidates representing potential matches to the requested media content, wherein each fuzzy candidate of the plurality of fuzzy candidates is associated with a popularity score;
ranking the fuzzy candidate list to form a ranked fuzzy candidate list including a highest ranked fuzzy candidate corresponding to a best potential match for the requested media content, wherein the highest ranked fuzzy candidate corresponding to the best potential match for the requested media content is determined based on a comparison of the popularity score of each fuzzy candidate of the plurality of fuzzy candidates; and
performing the action associated with the highest ranked fuzzy candidate.
|