US 12,451,145 B2
	Speech transmission method, system and apparatus, computer-readable storage medium, and device
Junbin Liang, Guangdong (CN)
Assigned to Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Guangdong (CN)
Filed on Mar. 2, 2022, as Appl. No. 17/685,242.
Application 17/685,242 is a continuation of application No. PCT/CN2020/124263, filed on Oct. 28, 2020.
Claims priority of application No. 202010104793.7 (CN), filed on Feb. 20, 2020.
Prior Publication US 2022/0189491 A1, Jun. 16, 2022
Int. Cl. G10L 19/005 (2013.01); G06N 3/08 (2023.01); G10L 19/008 (2013.01); G10L 19/16 (2013.01); G10L 25/60 (2013.01); H04L 1/00 (2006.01)

CPC G10L 19/005 (2013.01) [G06N 3/08 (2013.01); G10L 19/008 (2013.01); G10L 19/16 (2013.01); G10L 19/167 (2013.01); G10L 25/60 (2013.01); H04L 1/0002 (2013.01); H04L 1/0009 (2013.01); H04L 1/0019 (2013.01); H04L 1/0026 (2013.01); H04L 1/0041 (2013.01)]

17 Claims

1. An audio transmission method, comprising:

receiving, by processing circuitry, current encoded data of an audio signal;

determining, by the processing circuitry, a packet loss recovery capability corresponding to the current encoded data according to a first audio encoding feature parameter corresponding to the current encoded data and a second audio encoding feature parameter corresponding to previous or subsequent encoded data of the current encoded data using a packet loss recovery capability prediction neural network model, the packet loss recovery capability corresponding to a quality loss of a decoded audio when the current encoded data is lost while being transmitted over a network, wherein the packet loss recovery capability corresponding to the current encoded data is inversely correlated with a score difference between (i) a first audio quality score determined by decoding the current encoded data and (ii) a second audio quality score determined by decoding the current encoded data with packet loss recovery;

determining, by the processing circuitry, whether to perform redundant encoding for the current encoded data according to the packet loss recovery capability; and

in response to the redundant encoding being determined to be performed, performing, by the processing circuitry, the redundant encoding according to the current encoded data to generate a corresponding redundancy packet,

wherein the packet loss recovery capability prediction neural network model is trained, the training of the packet loss recovery capability prediction neural network model comprising:

receiving a sample audio sequence in a training set;

encoding the sample audio sequence to obtain a sample audio encoding bitstream;

extracting, from the sample audio encoding bitstream, a first sample audio encoding feature parameter corresponding to a current sample encoded data and a second sample audio encoding feature parameter corresponding to previous or subsequent sample encoded data of the current sample encoded data;

inputting the first audio encoding feature parameter and the second audio encoding feature parameter into a machine learning model to output a predicted packet loss recovery capability corresponding to the current sample encoded data; and

adjusting a model parameter of the machine learning model according to a difference between a real packet loss recovery capability and the predicted packet loss recovery capability.