US 12,136,423 B2
	Transcript correction through programmatic comparison of independently generated transcripts
Kundan Kumar, Montreal (CA); and Vicki Anand, Montreal (CA)
Assigned to Descript, Inc., San Francisco, CA (US)
Filed by Descript, Inc., San Francisco, CA (US)
Filed on Dec. 18, 2020, as Appl. No. 17/127,166.
Claims priority of provisional application 62/953,082, filed on Dec. 23, 2019.
Prior Publication US 2021/0193148 A1, Jun. 24, 2021
Int. Cl. G10L 15/26 (2006.01); G06F 3/16 (2006.01); G10L 15/01 (2013.01); G10L 15/08 (2006.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01); G10L 15/32 (2013.01)

CPC G10L 15/26 (2013.01) [G06F 3/165 (2013.01); G10L 15/01 (2013.01); G10L 15/08 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 15/32 (2013.01); G10L 2015/088 (2013.01)]

20 Claims

1. A method performed by a processor included in a computing device, the method comprising:

receiving input indicative of a selection of an audio file;

retrieving the audio file from a storage medium;

forwarding

a first copy of the audio file to a first transcription service via a first application programming interface, and

a second copy of the audio file to a second transcription service via a second application programming interface;

receiving

a first transcript from the first transcription service via the first application programming interface, and

a second transcript from the second transcription service via the second application programming interface;

producing, based on an analysis of the first and second transcripts, a tuple for each word uttered in the audio file, so as to create a series of tuples that are populated into a data structure,

wherein each tuple includes a field in which it is indicated whether interpretations of a corresponding word across the first and second transcripts are identical;

identifying a discrepancy by examining the data structure to identify a conflicting translation between the first and second transcripts,

wherein the conflicting translation corresponds to a portion of the audio file for which the first transcription service had a first interpretation and the second transcription service had a second interpretation;

applying, to the first and second interpretations, a computer-implemented model that addresses the discrepancy by identifying an appropriate translation for the conflicting translation from among the first and second interpretations,

wherein upon being applied to the first and second interpretations, the computer-implemented model analyzes grammar or sentence structure, of the first interpretation and surrounding words and of the second interpretation and the surrounding words, to identify the appropriate translation;

generating a master transcript, with the appropriate translation identified by the computer-implemented model, based on the data structure; and

causing display of the master transcript in such a manner that (i) the appropriate translation is visually distinguishable from a remainder of the master transcript and (ii) an alternative translation is positioned adjacent to the appropriate translation in line with the master transcript,

wherein the alternative translation is whichever of the first and second interpretations is not identified as the appropriate translation.