CPC G06T 3/0012 (2013.01) [G06T 3/40 (2013.01); G06T 7/0002 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/30168 (2013.01)] | 22 Claims |
1. A method for processing imagery, the method comprising:
constructing, by one or more processors, a multi-scale representation of a native resolution image, the multi-scale representation including the native resolution image and a set of aspect ratio preserving resized variants;
encoding, by the one or more processors, a corresponding spatial embedding for each patch associated with a respective region of either the native resolution image or one of the set of aspect ratio preserving resized variants, thereby forming a set of spatially encoded patches;
applying, by the one or more processors, a set of scale embeddings to the set of spatially encoded patches to capture scale information associated with the native resolution image and the set of aspect ratio resized variants, thereby forming a set of input tokens; and
performing, by the one or more processors according to a transformer encoder module, self-attention on the set of input tokens to create a final image representation.
|