US 11,715,475 B2
	Method and system for evaluating and improving live translation captioning systems
Arkady Arkhangorodsky, Los Angeles, CA (US); Christopher Chu, Los Angeles, CA (US); Scot Fang, Los Angeles, CA (US); Denglin Jiang, Los Angeles, CA (US); Yiqi Huang, Los Angeles, CA (US); Ajay Nagesh, Los Angeles, CA (US); Boliang Zhang, Los Angeles, CA (US); and Kevin Knight, Los Angeles, CA (US)
Assigned to Beijing DiDi Infinity Technology and Development Co., Ltd., Beijing (CN)
Filed by Beijing DiDi Infinity Technology and Development Co., Ltd., Beijing (CN)
Filed on Sep. 20, 2021, as Appl. No. 17/479,349.
Prior Publication US 2023/0089902 A1, Mar. 23, 2023
Int. Cl. G06F 40/00 (2020.01); G10L 15/32 (2013.01); G10L 15/08 (2006.01); G10L 15/06 (2013.01); G06F 40/49 (2020.01); G06F 40/51 (2020.01); G06F 40/58 (2020.01); G10L 25/27 (2013.01)

CPC G10L 15/32 (2013.01) [G06F 40/49 (2020.01); G06F 40/51 (2020.01); G06F 40/58 (2020.01); G10L 15/063 (2013.01); G10L 15/08 (2013.01); G10L 25/27 (2013.01); G10L 2015/088 (2013.01)]

15 Claims

1. A method for evaluating performance of a live translation captioning system, comprising:

displaying a word in a first language on a first user interface;

receiving a first audio sequence, the first audio sequence comprising a verbal description of the word in the first language;

generating a first translated text in a second language by feeding the first audio sequence into a pipeline comprising an Automatic Speech Recognition (ASR) subsystem and a machine translation (MT) subsystem;

displaying the first translated text on a second user interface;

receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text;

generating a second translated text in the first language by feeding the second audio sequence into the pipeline;

determining a matching score between the word and the second translated text;

determining a performance score of the live translation captioning system based on the matching score.