US 12,437,771 B2
	Neural network-based audio packet loss restoration method and apparatus, and system
Quanzhi Xiao, Zhuhai (CN); Yufeng Yan, Zhuhai (CN); Rongjun Huang, Zhuhai (CN); and Guiping Fang, Zhuhai (CN)
Assigned to ZHUHAI JIELI TECHNOLOGY CO., LTD, Zhuhai (CN)
Appl. No. 17/911,733
Filed by ZHUHAI JIELI TECHNOLOGY CO., LTD, Zhuhai (CN)
PCT Filed Sep. 30, 2020, PCT No. PCT/CN2020/119603 § 371(c)(1), (2) Date Sep. 15, 2022, PCT Pub. No. WO2021/184732, PCT Pub. Date Sep. 23, 2021.
Claims priority of application No. 202010200811.1 (CN), filed on Mar. 20, 2020.
Prior Publication US 2023/0245668 A1, Aug. 3, 2023
Int. Cl. G10L 21/02 (2013.01); G10L 25/30 (2013.01); G10L 25/45 (2013.01); G10L 25/60 (2013.01)

CPC G10L 21/02 (2013.01) [G10L 25/30 (2013.01); G10L 25/45 (2013.01); G10L 25/60 (2013.01)]

17 Claims

1. A neural network-based audio data packet loss restoration method, applied to an audio device comprising a processor, the method comprising:

step s101: acquiring, by the processor, an audio data packet, wherein the audio data packet comprises a plurality of audio data frames, the plurality of audio data frames at least comprises a plurality of voice signal frames, and a voice signal frame is an audio data frame comprising a voice signal;

step s102: distinguishing, by the processor, the voice signal frame from the non-voice signal frame in the plurality of audio data frames according to a preset algorithm;

step s103: when there is a frame loss situation that a voice signal frame in the plurality of audio data frames is lost, determining, by the processor, a position of a lost voice signal frame in the plurality of audio data frames to obtain position information of the lost frame, the position comprises a first preset position or a second preset position, wherein the position information of the lost frame is a position of the lost voice signal frame in a group of voice signal frames, and the group of voice signal frames comprises N voice signal frames, wherein N is an integer greater than or equal to five;

step s105: selecting, by the processor, a neural network model for a restoration of the frame loss situation according to the position information of the lost frame, wherein the neural network model comprises a first restoration model and a second restoration model, the first restoration model is configured to restore a voice signal frame at the first preset position, and the second restoration model is configured to restore a voice signal frame at the second preset position; and

step s107: inputting, by the processor, the plurality of audio data frames into the selected neural network model to restore the lost voice signal frame;

wherein the plurality of audio data frames further includes a non-voice signal frame.