US 12,080,100 B2
Face-aware person re-identification system
Yumin Suh, Santa Clara, CA (US); Xiang Yu, Mountain View, CA (US); Yi-Hsuan Tsai, Santa Clara, CA (US); Masoud Faraki, San Jose, CA (US); and Manmohan Chandraker, Santa Clara, CA (US)
Assigned to NEC Corporation, Tokyo (JP)
Filed by NEC Laboratories America, Inc., Princeton, NJ (US)
Filed on Nov. 5, 2021, as Appl. No. 17/519,986.
Claims priority of provisional application 63/114,030, filed on Nov. 16, 2020.
Claims priority of provisional application 63/111,809, filed on Nov. 10, 2020.
Prior Publication US 2022/0147735 A1, May 12, 2022
Int. Cl. G06V 40/16 (2022.01); G06F 18/214 (2023.01); G06V 20/52 (2022.01); G06V 40/10 (2022.01)
CPC G06V 40/172 (2022.01) [G06F 18/214 (2023.01); G06V 20/52 (2022.01); G06V 40/103 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method for employing facial information in unsupervised person re-identification, the method comprising:
extracting, by a body feature extractor, body features from a first data stream;
extracting, by a head feature extractor, head features from a second data stream;
outputting a body descriptor vector from the body feature extractor;
outputting a head descriptor vector from the head feature extractor;
concatenating the body descriptor vector and the head descriptor vector to enable a model to generate a descriptor vector;
enhancing the first data stream using the head features and the second data stream using the body features by utilizing cross-task consistency as a self-supervision during training;
training the head feature extractor using a face knowledge distillation loss being applied randomly at each iteration with a particular probability;
employing a two-stream network where each stream processes body and head images separately, and utilizing both body and head appearances together;
preserving a relation between images across two modalities using cross-modal consistency loss;
employing additional face recognition loss for training the face stream of the model using pseudo labels for images from a target domain when frontal faces are visible; and
distilling knowledge obtained from a face recognition engine on the target domain to a head sub-network of the model.