US 12,067,987 B2
Method and device of generating acoustic features, speech model training, and speech recognition
Linhao Dong, Beijing (CN); and Zejun Ma, Beijing (CN)
Assigned to BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD., Beijing (CN)
Filed by Beijing Youzhuju Network Technology Co., Ltd., Beijing (CN)
Filed on Jan. 30, 2024, as Appl. No. 18/427,538.
Application 18/427,538 is a continuation of application No. PCT/CN2022/109381, filed on Aug. 1, 2022.
Claims priority of application No. 202110881723.7 (CN), filed on Aug. 2, 2021.
Prior Publication US 2024/0169988 A1, May 23, 2024
Int. Cl. G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01)
CPC G10L 15/22 (2013.01) [G10L 15/063 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method of generating acoustic features, comprising:
acquiring an acoustic information vector of a current speech frame and an information weight of the current speech frame;
obtaining an accumulated information weight corresponding to the current speech frame according to an accumulated information weight corresponding to a previous speech frame, a retention rate corresponding to the current speech frame, and an information weight of the current speech frame; wherein the retention rate is a difference between 1 and a leakage rate;
in a case that the accumulated information weight corresponding to the current speech frame is less than a threshold, obtaining an integrated acoustic information vector corresponding to the current speech frame according to an integrated acoustic information vector corresponding to the previous speech frame, the retention rate corresponding to the current speech frame, the information weight of the current speech frame, and the acoustic information vector of the current speech frame;
in a case that the accumulated information weight corresponding to the current speech frame is greater than or equal to the threshold, using the integrated acoustic information vector corresponding to the previous speech frame and the acoustic information vector of the current speech frame to output an issued integrated acoustic information vector, and calculating to obtain the integrated acoustic information vector corresponding to the current speech frame; and
after obtaining the integrated acoustic information vector corresponding to the current speech frame, taking a next speech frame as the current speech frame, and repeating a step of acquiring the acoustic information vector of the current speech frame and the information weight of the current speech frame and subsequent steps until there is no next speech frame.