US 12,456,289 B2
	Method for feature detection of complex defects based on multimodal data
Jun Wang, Nanjing (CN); Yuxiang Wu, Nanjing (CN); Dawei Li, Nanjing (CN); and Yuan Zhang, Nanjing (CN)
Assigned to Nanjing University of Aeronautics and Astronautics, Nanjing (CN)
Filed by Nanjing University of Aeronautics and Astronautics, Nanjing (CN)
Filed on Oct. 25, 2022, as Appl. No. 17/972,942.
Claims priority of application No. 202210256372.5 (CN), filed on Mar. 16, 2022.
Prior Publication US 2023/0316736 A1, Oct. 5, 2023
Int. Cl. G06V 10/80 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01)

CPC G06V 10/806 (2022.01) [G06V 10/774 (2022.01); G06V 10/82 (2022.01)]

4 Claims

1. A method for feature detection of complex defects based on multimodal data, specifically comprising the following steps:

step S1: constructing a plurality of parallel feature extraction networks;

step S2: inputting multimodal training data into the plurality of parallel feature extraction networks for parallel learning of multimodal features;

step S3: according to the multimodal features, constructing a multimodal feature cross-guidance network, and establishing a local connection between the plurality of parallel feature extraction networks to form a multimodal feature cross-guidance mechanism;

step S4: based on the plurality of parallel feature extraction networks, performing multimodal adaptive fusion by using weights to obtain a feature information; and

step S5: based on the feature information, implementing defect detection by using a classification subnetwork and a regression subnetwork;

wherein step S1 specifically comprises: constructing the plurality of parallel feature extraction networks by using a convolutional neural network, which correspond to extraction of data of multiple modals respectively, wherein each of the parallel feature extraction networks comprises six layers, which comprise different convolutional layers, pooling layers, dense block structures, and dilated bottleneck layer structures; and

step S3 specifically comprises: establishing a local connection between the plurality of parallel feature extraction networks in a first stage, a third stage, and a fifth stage by using a 1×1 convolutional layer, merging features of a same stage first, and finally superimposing the merged features on each parallel feature extraction network as a whole through the 1×1 convolutional layer, to implement cross guidance of multimodal features, and establish a feature flow mechanism of different modal data in feature extraction.