US 12,444,168 B2
	Systems and methods for object detection using image tiling
Jilin Tu, Santa Clara, CA (US); Jiang Wang, Santa Clara, CA (US); Huizhong Chen, Mountain View, CA (US); Xiangxin Zhu, Sunnyvale, CA (US); and Shengyang Dai, Dublin, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/622,462
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Aug. 5, 2019, PCT No. PCT/US2019/045089 § 371(c)(1), (2) Date Dec. 23, 2021, PCT Pub. No. WO2021/025677, PCT Pub. Date Feb. 11, 2021.
Prior Publication US 2022/0254137 A1, Aug. 11, 2022
Int. Cl. G06V 10/778 (2022.01); G06V 10/764 (2022.01); G06V 10/75 (2022.01)

CPC G06V 10/765 (2022.01) [G06V 10/778 (2022.01); G06V 10/759 (2022.01)]

16 Claims

1. A computing system comprising:

at least one processor;

a preliminary machine-learned object detection model configured to receive an image, and, in response to receipt of the image, output an intermediate feature representation;

a machine-learned object detection model configured to receive a plurality of tiles, and, in response to receipt of the plurality of tiles, output object detection data for the plurality of tiles, the object detection data comprising a plurality of bounding boxes respectively defined with respect to individual ones of the plurality of tiles; and

at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising:

generating an image pyramid based on the image having an image space, the image pyramid comprising a first level corresponding with the image at a first resolution and a second level corresponding with the image at a second resolution that is different than the first resolution, wherein generating the image pyramid based on the image comprises:

inputting the image into the preliminary machine-learned object detection model, the image being input as a plurality of preliminary tiles;

receiving, as an output of the preliminary machine-learned object detection model, the intermediate feature representation, the intermediate feature representation corresponding with the plurality of preliminary tiles; and

generating the first level and the second level of the image pyramid based on the intermediate feature representation;

tiling the first level and the second level by dividing the first level into a first plurality of tiles and the second level into a second plurality of tiles;

inputting the first plurality of tiles and the second plurality of tiles into the machine-learned object detection model;

receiving, as an output of the machine-learned object detection model, the object detection data comprising the plurality of bounding boxes respectively defined with respect to individual ones of the first plurality of tiles and the second plurality of tiles; and

generating an image object detection output by mapping the object detection data onto the image space of the image.