US 12,469,484 B2
	Foreign language pronunciation assessment apparatus and control method thereof
Eui Sung Kim, Seongnam-si (KR); Hye Ji Seo, Seongnam-si (KR); and Hong Lee, Seongnam-si (KR)
Assigned to KAKAO ENTERPRISE CORP., Seongnam-si (KR)
Filed by Kakao Enterprise Corp., Seongnam-si (KR)
Filed on Apr. 21, 2022, as Appl. No. 17/725,632.
Claims priority of application No. 10-2022-0029489 (KR), filed on Mar. 8, 2022.
Prior Publication US 2023/0290269 A1, Sep. 14, 2023
Int. Cl. G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/187 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2006.01)

CPC G10L 15/02 (2013.01) [G10L 15/063 (2013.01); G10L 15/187 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2013.01); G10L 15/16 (2013.01)]

17 Claims

1. A control method of a foreign language pronunciation assessment apparatus, the control method comprising:

training an end-to-end speech recognizer with native speaker data;

tuning the trained end-to-end speech recognizer with non-native speaker data;

training a scoring module on the basis of the tuned end-to-end speech recognizer;

outputting, by the tuned end-to-end speech recognizer, layer-wise context representations when non-native speaker speech is input to the tuned end-to-end speech recognizer; and

calculating, by the trained scoring module, a prediction score for evaluating a pronunciation of the input non-native speaker speech based on the layer-wise context representations,

wherein the end-to-end speech recognizer is configured to include a plurality of encoder layers,

wherein the layer-wise context representations are a combination of outputs of the plurality of encoder layers, and

wherein, when speech is input to the end-to-end speech recognizer, the speech sequentially passes through the plurality of encoder layers.