US 12,307,746 B1
	Nighttime unmanned aerial vehicle object tracking method fusing hybrid attention mechanism
Yanmei Li, Chongqing (CN); Tao Yu, Chongqing (CN); Hanguang Xiao, Chongqing (CN); Ningsheng Liao, Chongqing (CN); Xiaoshuang Li, Chongqing (CN); Qibin Yang, Chongqing (CN); and Jingshi Deng, Chongqing (CN)
Assigned to Chongqing University of Technology, Chongqing (CN)
Filed by Chongqing University of Technology, Chongqing (CN)
Filed on Jan. 23, 2025, as Appl. No. 19/035,653.
Claims priority of application No. 202410099370.9 (CN), filed on Jan. 24, 2024.
Int. Cl. G06V 10/77 (2022.01); G06V 10/20 (2022.01); G06V 10/30 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/17 (2022.01); G06V 20/40 (2022.01)

CPC G06V 10/7715 (2022.01) [G06V 10/255 (2022.01); G06V 10/30 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01); G06V 20/17 (2022.01); G06V 20/40 (2022.01)]

9 Claims

1. A nighttime unmanned aerial vehicle (UAV) object tracking method fusing a hybrid attention mechanism, comprising: acquiring a night vision video sequence from a UAV, inputting a nighttime image frame of the night vision video sequence into a pre-trained night vision image enhancement model, to obtain a corresponding enhanced nighttime image frame, and performing image object tracking and recognition on the enhanced nighttime image frame, to obtain an object tracking and recognition result;

the night vision image enhancement model comprising an encoder module, a spatial hybrid attention module, a channel hybrid attention module, a decoder module, a curve projection module, and a denoising processing module;

the encoder module being configured to extract an initial convolutional feature map of the nighttime image frame;

the spatial hybrid attention module being configured to enhance the attention of a feature space dimension of the initial convolutional feature map, to form a spatial attention feature map of the nighttime image frame;

the channel hybrid attention module being configured to enhance the attention of a feature channel dimension of the spatial attention feature map, to form a hybrid attention feature map of the nighttime image frame;

the decoder module being configured to convert the hybrid attention feature map into a curve estimation parameter map;

the curve projection module being configured to map the curve estimation parameter map onto the nighttime image frame in a curve projection manner, to form an intermediate feature image of the nighttime image frame; and

the denoising processing module being configured to perform denoising processing on the intermediate feature image, to obtain the corresponding enhanced nighttime image frame;

the decoder module comprising four convolutional layers and four upsampling layers connected in series, for performing convolutional processing and upsampling deconvolution processing on the inputted hybrid attention feature map in sequence, followed by hyperbolic tangent conversion processing, to obtain the curve estimation parameter map; a processing procedure of the decoder module being expressed as:

F_de1=Up(Conv(F_CHA)_de1)_de1;

F_de2=Up(Conv(F_de1)_de2)_de2;

F_de3=Up(Conv(F_de2)_de3)_de3;

F_de4=Up(Conv(F_de3)_de4)_de4;

F_de=tanh(Conv(F_de4)_de);

where F_CHArepresents a hybrid attention feature map inputted into the decoder module, Conv(⋅)_deirepresents an operator of an i_thconvolutional layer in the decoder module, Up(⋅)_deirepresents an operator of an i_thupsampling deconvolution layer in the decoder module, i=1,2,3,4, F_de1, F_de2, F_de3, and F_de4represent intermediate operational outputs in the decoder module, Conv(⋅)_derepresents a convolutional operator during conversion processing in the decoder module, tanh(⋅) represents a hyperbolic tangent function tanh operation, and F_derepresents a curve estimation parameter map outputted by the encoder module.