| CPC G06V 10/774 (2022.01) [G06F 18/2155 (2023.01); G06F 18/22 (2023.01); G06N 3/08 (2013.01); G06V 10/74 (2022.01); G06V 10/751 (2022.01)] | 13 Claims |

|
1. A computer-implemented video method, comprising:
extracting features of a first modality and a second modality from a labeled first training dataset in a first domain and an unlabeled second training dataset in a second domain;
training a video analysis model using contrastive learning on the extracted features, including optimization of a loss function that includes a cross-domain regularization part that compares features from a first training data from the first training dataset and a second training data from the second training dataset, the second training data having a pseudo label that matches the label of the first training data, and a cross-modality regularization part that compares features from different cue types in a same domain, with the cross-domain less regularization part being expressed as
![]() where ϕ+st(Fsik,Fti+l) measures similarity between features having a same modality and different domains for positive samples and ϕ−st(Fsik,Fti−l) measures similarity between features having a same modality and different domains for negative samples, and further including generating pseudo-labels for the unlabeled dataset.
|