US 11,989,956 B2
	Dynamic head for object detection
Xiyang Dai, Seattle, WA (US); Yinpeng Chen, Sammamish, WA (US); Bin Xiao, Bellevue, WA (US); Dongdong Chen, Bellevue, WA (US); Mengchen Liu, Redmond, WA (US); Lu Yuan, Redmond, WA (US); and Lei Zhang, Bellevue, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Apr. 5, 2021, as Appl. No. 17/222,879.
Prior Publication US 2022/0318541 A1, Oct. 6, 2022
Int. Cl. G06V 20/64 (2022.01); G02B 27/01 (2006.01); G06T 3/06 (2024.01); G06T 3/40 (2006.01)

CPC G06V 20/64 (2022.01) [G02B 27/0172 (2013.01); G06T 3/06 (2024.01); G06T 3/40 (2013.01)]

20 Claims

1. A computerized method for object detection, the computerized method comprising:

generating a feature pyramid corresponding to image data;

resealing the feature pyramid to a scale corresponding to a median level of the feature pyramid, wherein the resealed feature pyramid is a four-dimensional (4D) tensor;

reshaping the 4D tensor into a three-dimensional (3D) tensor having individual perspectives including scale features, spatial features, and task features corresponding to different dimensions of the 3D tensor, wherein the dimensions of the 3D tensor include a level dimension, a space dimension, and a channel dimension;

using the 3D tensor and a plurality of attention layers to update a plurality of feature maps associated with the image data, wherein the attention layers of include a scale-aware attention corresponds to the level dimension of the 3D tensor, a spatial-aware attention corresponds to the space dimension of the 3D tensor, and a task-aware attention correspond to the channel dimension of the 3D tensor;

performing object detection on the image data using the updated plurality of feature maps; and

sending an output signal based on the object detection performed on the image data.