| CPC G06T 3/4046 (2013.01) [G06N 3/04 (2013.01); G06T 5/00 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20021 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/20221 (2013.01)] | 19 Claims |

|
1. An apparatus comprising:
a non-transitory computer-readable memory storing instructions; and
at least one processor circuit to be programmed by the instruction to:
divide an input image into non-overlapping patches;
flatten the non-overlapping patches into vectors,
generate input tokens from the vectors by using a linear projection, the input tokens representing features in the input image,
process the input tokens in a plurality of transformer stages,
obtain output tokens from the plurality of transformer stages, each output token obtained from a different one of the plurality of transformer stages,
reassemble the output tokens into image-like representations, each output token reassembled into a different one of the image-like representations,
fuse the image-like representations, and
generate an image representing a dense prediction based on the fusing the image-like representations.
|