| CPC G06V 10/811 (2022.01) [G01S 17/89 (2013.01); G06V 10/776 (2022.01)] | 20 Claims |

|
1. A method for multi-modal test-time adaptation, comprising:
inputting a digital image into a pre-trained Camera Intra-modal Pseudo-label Generator (C-Intra-PG);
inputting a Lidar point cloud set into a pre-trained Lidar Intra-modal Pseudo-label Generator (L-Intra-PG);
applying a fast 2-dimension (2D) model, F2D, and a slow 2D model, S2D, to the inputted digital image to apply pseudo-labels to the digital image;
applying a fast 3-dimension (3D) model, F3D, and a slow 3D model, S3D, to the inputted Lidar point cloud set to apply pseudo-labels to the Lidar point cloud set;
fusing pseudo-label predictions from the fast (F2D, F3D) models and the slow (S2D, S3D) models through Inter-modal Pseudo-label Refinement (Inter-PR) module to obtain robust pseudo labels;
measuring a prediction consistency for each of the digital image pseudo-labels and Lidar pseudo-labels separately;
selecting confident pseudo-labels from the robust pseudo labels and measured prediction consistencies to form a final cross-modal pseudo-label set as a self-training signal; and
updating batch parameters of the Camera Intra-modal Pseudo-label Generator and Lidar Intra-modal Pseudo-label Generator utilizing the self-training signal.
|