US 12,217,382 B2
	Multi-scale transformer for image analysis
Junjie Ke, East Palo Alto, CA (US); Feng Yang, Sunnyvale, CA (US); Qifei Wang, Mountain View, CA (US); Yilin Wang, Sunnyvale, CA (US); and Peyman Milanfar, Menlo Park, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 4, 2023, as Appl. No. 18/527,528.
Application 18/527,528 is a continuation of application No. 17/787,699, granted, now 11,887,270, previously published as PCT/US2021/040111, filed on Jul. 1, 2021.
Prior Publication US 2024/0119555 A1, Apr. 11, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06K 9/00 (2022.01); G06T 3/04 (2024.01); G06T 3/40 (2006.01); G06T 7/00 (2017.01)

CPC G06T 3/04 (2024.01) [G06T 3/40 (2013.01); G06T 7/0002 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/30168 (2013.01)]

22 Claims

1. A method for processing imagery, the method comprising:

encoding, by one or more processors for a set of images including a native resolution image and one or more resized variants of the native resolution image, a corresponding spatial embedding for each patch associated with a respective region of either the native resolution image or a given one of the one or more resized variants, to form a set of spatially encoded patches;

applying, by one or more processors, a set of scale embeddings to the set of spatially encoded patches to capture scale information associated with the native resolution image and the given one of the one or more resized variants to form a set of input tokens; and

creating, by one or more processors a final image representation via a self-attention process on the set of input tokens.