| CPC G06V 10/774 (2022.01) [G06F 18/2155 (2023.01); G06F 18/22 (2023.01); G06N 3/08 (2013.01); G06V 10/74 (2022.01); G06V 10/751 (2022.01)] | 8 Claims |

|
1. A computer-implemented video method, comprising:
extracting features of a first modality and a second modality from a labeled first training dataset in a first domain and an unlabeled second training dataset in a second domain, the labeled first training dataset including source videos and action labels, the source videos being received from a camera, the action labels indicating a patient's interactions with therapeutic equipment and use of medications in healthcare;
training a video analysis model using contrastive learning on the extracted features, including optimization of a loss function that includes a cross-domain regularization part that compares features from a first training data from the first training dataset and a second training data from the second training dataset, the second training data having a pseudo label that matches the label of the first training data, and a cross-modality regularization part that compares features from different cue types in a same domain, with the cross-domain regularization part being expressed as
![]() where ϕ+st(Fsik, Fti+l) measures similarity between features having a same modality and different domains for positive samples and ϕ−st(Fsik, Fti−l) measures similarity between features having a same modality and different domains for negative samples, and further including generating pseudo-labels for the unlabeled dataset.
|