US 12,088,823 B2
Rate control machine learning models with feedback control for video encoding
Chenjie Gu, Sunnyvale, CA (US); Hongzi Mao, Newtown Square, PA (US); Ching-Han Chiang, Santa Clara, CA (US); Cheng Chen, Milpitas, CA (US); Jingning Han, Los Altos, CA (US); Ching Yin Derek Pang, San Jose, CA (US); Rene Andre Claus, Santa Clara, CA (US); Marisabel Guevara Hechtman, Miami, FL (US); Daniel James Visentin, London (GB); Christopher Sigurd Fougner, Küsnacht (CH); Charles Booth Schaff, Chicago, IL (US); Nishant Patil, Sunnyvale, CA (US); and Alejandro Ramirez Bellido, Sant Just Desvern (ES)
Assigned to DeepMind Technologies Limited, London (GB)
Appl. No. 18/030,182
Filed by DeepMind Technologies Limited, London (GB)
PCT Filed Nov. 3, 2021, PCT No. PCT/EP2021/080508
§ 371(c)(1), (2) Date Apr. 4, 2023,
PCT Pub. No. WO2022/096503, PCT Pub. Date May 12, 2022.
Claims priority of provisional application 63/109,270, filed on Nov. 3, 2020.
Prior Publication US 2023/0336739 A1, Oct. 19, 2023
Int. Cl. H04N 7/12 (2006.01); H04N 19/126 (2014.01); H04N 19/149 (2014.01); H04N 19/172 (2014.01)
CPC H04N 19/149 (2014.11) [H04N 19/126 (2014.11); H04N 19/172 (2014.11)] 20 Claims
OG exemplary drawing
 
1. A method performed by one or more data processing apparatus for encoding a video comprising a sequence of video frames to generate a respective encoded representation of each video frame, the method comprising, for one or more of the video frames:
obtaining a feature embedding for the video frame;
processing an input comprising the feature embedding for the video frame using a rate control machine learning model to generate a respective score for each of a plurality of possible quantization parameter values;
selecting a quantization parameter value from the plurality of possible quantization parameter values using the scores;
determining a cumulative amount of data required to represent: (i) an encoded representation of the video frame that is generated in accordance with a quantization step size associated with the selected quantization parameter value and (ii) encoded representations of each video frame that precedes the video frame;
determining, based on the cumulative amount of data, that a feedback control criterion for the video frame is satisfied;
updating the selected quantization parameter value in response to determining that the feedback control criterion is satisfied; and
processing the video frame using an encoding model, in accordance with a quantization step size associated with the selected quantization parameter value, to generate the encoded representation of the video frame.