US 12,262,032 B2
	Reinforcement learning based rate control
Jiahao Li, Beijing (CN); Bin Li, Beijing (CN); Yan Lu, Beijing (CN); Tom W. Holcomb, Sammamish, WA (US); Mei-Hsuan Lu, Taipei (TW); Andrey Mezentsev, Redmond, WA (US); and Ming-Chieh Lee, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Appl. No. 18/013,240
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
PCT Filed Jun. 30, 2020, PCT No. PCT/CN2020/099390 § 371(c)(1), (2) Date Dec. 27, 2022, PCT Pub. No. WO2022/000298, PCT Pub. Date Jan. 6, 2022.
Prior Publication US 2023/0319292 A1, Oct. 5, 2023
Int. Cl. H04N 19/196 (2014.01)

CPC H04N 19/196 (2014.11)

20 Claims

1. A computer-implemented method, comprising:

determining an encoding state of a video encoder, the encoding state associated with encoding a first video unit by the video encoder;

determining, by a reinforcement learning model and based on the encoding state of the video encoder, an encoding parameter associated with rate control for the video encoder;

encoding a second video unit different from the first video unit based on the encoding parameter; and

training the reinforcement learning model according to a reward for the encoding parameter based on the encoding of the second video unit, the reward being configured to penalize buffer overshooting and to increase as the encoding parameter results in a higher visual quality, wherein the reward is based on a base reward that:

has a negative value if buffer overshooting occurs;

increases as the encoding parameter decreases if buffer overshooting does not occur; and

is scaled by a scaling factor to obtain the reward.