US 12,266,362 B2
Systems and methods for a two pass diarization, automatic speech recognition, and transcript generation
Jean-Philippe Robichaud, Mercier (CA); Alexei Skurikhin, Redwood City, CA (US); Migüel Jetté, Squamish (CA); and Petrov Evgeny Stanislavovich, Saint Petersburg (RU)
Assigned to Rev.com, Inc., San Francisco, CA (US)
Filed by Rev.com, Inc., San Francisco, CA (US)
Filed on Nov. 2, 2020, as Appl. No. 17/087,330.
Application 17/087,330 is a continuation of application No. 16/177,061, filed on Oct. 31, 2018, granted, now 10,825,458.
Prior Publication US 2021/0050015 A1, Feb. 18, 2021
Int. Cl. G10L 15/00 (2013.01); G10L 15/26 (2006.01); G10L 17/00 (2013.01); G10L 19/038 (2013.01); G10L 15/02 (2006.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01)
CPC G10L 15/26 (2013.01) [G10L 17/00 (2013.01); G10L 19/038 (2013.01); G10L 15/02 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method of performing diarization on a sound recording, the method comprising:
receiving a sound recording;
breaking the sound recording into a plurality of chunks;
performing a first diarization on the plurality of chunks, wherein the performing the first diarization on the plurality of chunks occurs simultaneously, and wherein the performing includes breaking each of the plurality of chunks into a plurality of segments, for each of the plurality of segments generating statistical speaker information descriptive of the sound characteristics in that segment, and clustering, within each chunk of the plurality of chunks, segments having similar statistical speaker information to generate within each chunk of the plurality of chunks groups of segments grouped according to the similar statistical speaker information;
performing a second diarization by clustering between the plurality of chunks, the groups of segments according to grouped similar statistical speaker information, the grouped similar statistical speaker information being characteristics of speech of each group for the groups of segments, wherein the second diarization performs a modified I-Vector scoring, based on the groups of segments according to grouped similar statistical speaker information, I-vectors of the groups of segments according to grouped similar statistical speaker information are averaged and then compared to other averaged I-vectors, where a closeness of two or more averaged I-vectors is compared accordingly clustered based on similarity;
creating a new i-vector for the groups of segments according to grouped similar statistical speaker information.