CPC G06V 20/64 (2022.01) [G02B 27/0172 (2013.01); G06T 3/06 (2024.01); G06T 3/40 (2013.01)] | 20 Claims |
1. A computerized method for object detection, the computerized method comprising:
generating a feature pyramid corresponding to image data;
resealing the feature pyramid to a scale corresponding to a median level of the feature pyramid, wherein the resealed feature pyramid is a four-dimensional (4D) tensor;
reshaping the 4D tensor into a three-dimensional (3D) tensor having individual perspectives including scale features, spatial features, and task features corresponding to different dimensions of the 3D tensor, wherein the dimensions of the 3D tensor include a level dimension, a space dimension, and a channel dimension;
using the 3D tensor and a plurality of attention layers to update a plurality of feature maps associated with the image data, wherein the attention layers of include a scale-aware attention corresponds to the level dimension of the 3D tensor, a spatial-aware attention corresponds to the space dimension of the 3D tensor, and a task-aware attention correspond to the channel dimension of the 3D tensor;
performing object detection on the image data using the updated plurality of feature maps; and
sending an output signal based on the object detection performed on the image data.
|