| CPC G06T 3/04 (2024.01) [G06T 3/40 (2013.01); G06T 7/0002 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/30168 (2013.01)] | 22 Claims |

|
1. A method for processing imagery, the method comprising:
encoding, by one or more processors for a set of images including a native resolution image and one or more resized variants of the native resolution image, a corresponding spatial embedding for each patch associated with a respective region of either the native resolution image or a given one of the one or more resized variants, to form a set of spatially encoded patches;
applying, by one or more processors, a set of scale embeddings to the set of spatially encoded patches to capture scale information associated with the native resolution image and the given one of the one or more resized variants to form a set of input tokens; and
creating, by one or more processors a final image representation via a self-attention process on the set of input tokens.
|