| CPC G10L 19/005 (2013.01) [G06N 3/08 (2013.01); G10L 19/008 (2013.01); G10L 19/16 (2013.01); G10L 19/167 (2013.01); G10L 25/60 (2013.01); H04L 1/0002 (2013.01); H04L 1/0009 (2013.01); H04L 1/0019 (2013.01); H04L 1/0026 (2013.01); H04L 1/0041 (2013.01)] | 17 Claims |

|
1. An audio transmission method, comprising:
receiving, by processing circuitry, current encoded data of an audio signal;
determining, by the processing circuitry, a packet loss recovery capability corresponding to the current encoded data according to a first audio encoding feature parameter corresponding to the current encoded data and a second audio encoding feature parameter corresponding to previous or subsequent encoded data of the current encoded data using a packet loss recovery capability prediction neural network model, the packet loss recovery capability corresponding to a quality loss of a decoded audio when the current encoded data is lost while being transmitted over a network, wherein the packet loss recovery capability corresponding to the current encoded data is inversely correlated with a score difference between (i) a first audio quality score determined by decoding the current encoded data and (ii) a second audio quality score determined by decoding the current encoded data with packet loss recovery;
determining, by the processing circuitry, whether to perform redundant encoding for the current encoded data according to the packet loss recovery capability; and
in response to the redundant encoding being determined to be performed, performing, by the processing circuitry, the redundant encoding according to the current encoded data to generate a corresponding redundancy packet,
wherein the packet loss recovery capability prediction neural network model is trained, the training of the packet loss recovery capability prediction neural network model comprising:
receiving a sample audio sequence in a training set;
encoding the sample audio sequence to obtain a sample audio encoding bitstream;
extracting, from the sample audio encoding bitstream, a first sample audio encoding feature parameter corresponding to a current sample encoded data and a second sample audio encoding feature parameter corresponding to previous or subsequent sample encoded data of the current sample encoded data;
inputting the first audio encoding feature parameter and the second audio encoding feature parameter into a machine learning model to output a predicted packet loss recovery capability corresponding to the current sample encoded data; and
adjusting a model parameter of the machine learning model according to a difference between a real packet loss recovery capability and the predicted packet loss recovery capability.
|