US 12,250,400 B2
	Unified space-time interpolation of video information
Luming Liang, Redmond, WA (US); Zhicheng Geng, Austin, TX (US); Ilya Dmitriyevich Zharkov, Sammamish, WA (US); and Tianyu Ding, Kirkland, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Feb. 14, 2022, as Appl. No. 17/670,978.
Prior Publication US 2023/0262259 A1, Aug. 17, 2023
Int. Cl. H04N 19/587 (2014.01); H04N 19/132 (2014.01); H04N 19/31 (2014.01); H04N 19/33 (2014.01); H04N 19/42 (2014.01); H04N 19/59 (2014.01); H04N 19/61 (2014.01)

CPC H04N 19/587 (2014.11) [H04N 19/132 (2014.11); H04N 19/31 (2014.11); H04N 19/33 (2014.11); H04N 19/59 (2014.11); H04N 19/61 (2014.11); H04N 19/439 (2014.11)]

17 Claims

1. A method for interpolating video information, comprising:

obtaining input video information having a given first number of plural frames, each frame in the input video information having a given first spatial resolution;

generating feature information based the input video information;

encoding the feature information in a pipeline having plural encoding stages that operate at different respective resolutions, to produce plural instances of encoder attention information and plural instances of encoder output information, each instance of the encoder attention information expressing identified relations across the plural frames of the input video information;

producing a query based on an instance of encoder output information produced by a last encoding stage of the plural encoding stages;

decoding the query in a pipeline having plural decoding stages that operate at different respective resolutions, to produce plural instances of decoder output information, each decoding stage that has a preceding decoding stage receiving an instance of decoding input information produced by the preceding decoding stage, and each particular decoding stage operating on an instance of encoder attention information produced by a particular encoding stage that has a same resolution level as the particular decoding stage; and

producing output video information based on decoder output information produced by a last decoding stage of the plural decoding stages,

said producing including:

performing a reconstruction operation on the decoder output information produced by the last decoding stage, to produce reconstructed information;

interpolating the input video information to produce interpolated video information; and

combining the reconstructed information with the interpolated video information to produce the output video information,

the output video information having a second number of frames that is higher than the first number of frames in the input video information, and having a second spatial resolution that is higher than the first spatial resolution of the input video information.