US 11,749,285 B2
Speech transcription using multiple data sources
Vincent Charles Cheung, San Carlos, CA (US); Chengxuan Bai, San Mateo, CA (US); and Yating Sheng, San Francisco, CA (US)
Assigned to META PLATFORMS TECHNOLOGIES, LLC, Menlo Park, CA (US)
Filed by Meta Platforms Technologies, LLC, Menlo Park, CA (US)
Filed on Jan. 14, 2022, as Appl. No. 17/648,067.
Application 17/648,067 is a continuation of application No. 16/689,662, filed on Nov. 20, 2019, granted, now 11,227,602.
Prior Publication US 2022/0139400 A1, May 5, 2022
Int. Cl. G10L 17/00 (2013.01); G06F 3/01 (2006.01); G06T 19/00 (2011.01); G10L 25/63 (2013.01); H04R 1/40 (2006.01); H04R 3/00 (2006.01); G06V 40/16 (2022.01)
CPC G10L 17/00 (2013.01) [G06F 3/011 (2013.01); G06T 19/006 (2013.01); G06V 40/161 (2022.01); G10L 25/63 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
an audio capture system configured to capture audio data associated with a plurality of speakers;
an image capture system configured to capture images of one or more of the plurality of speakers; and
a speech processing engine configured to:
recognize a plurality of speech segments in the audio data,
identify, for each speech segment of the plurality of speech segments and based on the images, a speaker associated with the speech segment,
transcribe each of the plurality of speech segments to produce a transcription of the plurality of speech segments including, for each speech segment in the plurality of speech segments, an indication of which speaker is associated with the speech segment, and
analyze the transcription to identify an instance in which one of the speakers asks another one of the speakers to perform a task, and based on the identified instance, generate a list of tasks.