US 12,380,714 B2
	Methods and apparatus to perform dense prediction using transformer blocks
Rene Ranftl, Munich (DE); Alexey Bochkovskiy, Podolsk (RU); and Vladlen Koltun, Santa Clara, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Sep. 25, 2021, as Appl. No. 17/485,349.
Prior Publication US 2022/0012848 A1, Jan. 13, 2022
Int. Cl. G06T 3/4046 (2024.01); G06N 3/04 (2023.01); G06T 5/00 (2024.01)

CPC G06T 3/4046 (2013.01) [G06N 3/04 (2013.01); G06T 5/00 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20021 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/20221 (2013.01)]

19 Claims

1. An apparatus comprising:

a non-transitory computer-readable memory storing instructions; and

at least one processor circuit to be programmed by the instruction to:

divide an input image into non-overlapping patches;

flatten the non-overlapping patches into vectors,

generate input tokens from the vectors by using a linear projection, the input tokens representing features in the input image,

process the input tokens in a plurality of transformer stages,

obtain output tokens from the plurality of transformer stages, each output token obtained from a different one of the plurality of transformer stages,

reassemble the output tokens into image-like representations, each output token reassembled into a different one of the image-like representations,

fuse the image-like representations, and

generate an image representing a dense prediction based on the fusing the image-like representations.