US 12,235,945 B2
Acoustic-based face anti-spoofing system and method
Chenqi Kong, Hong Kong (CN); Kexin Zheng, Hong Kong (CN); Haoliang Li, Hong Kong (CN); and Shiqi Wang, Hong Kong (CN)
Assigned to City University of Hong Kong, Hong Kong (CN)
Filed by City University of Hong Kong, Hong Kong (CN)
Filed on Nov. 21, 2022, as Appl. No. 18/057,259.
Prior Publication US 2024/0169042 A1, May 23, 2024
Int. Cl. G06F 21/32 (2013.01); G06F 18/10 (2023.01); G06F 18/2131 (2023.01)
CPC G06F 21/32 (2013.01) [G06F 18/10 (2023.01); G06F 18/2131 (2023.01); G06F 2221/2127 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for detecting liveness of a presented face, the method comprising:
generating a first acoustic signal and projecting the generated first acoustic signal onto the presented face for probing the presented face, wherein the first acoustic signal comprises a plurality of time-limited chirps, causing a face-echo signal to be reflected from the presented face when the presented face receives an individual time-limited chirp, whereby a plurality of face-echo signals is created for the plurality of time-limited chirps;
receiving a second acoustic signal for capturing an acoustic response of the presented face due to the first acoustic signal, wherein the plurality of face-echo signals is embedded in the second acoustic signal;
preprocessing the received second acoustic signal to yield a plurality of extracted signal segments, wherein the preprocessing of the second acoustic signal includes extracting the plurality of face-echo signals from the received second acoustic signal such that an individual extracted signal segment contains a corresponding face-echo signal;
applying a Fourier transform (FT) to the individual extracted signal segment to yield a frequency segment, whereby a plurality of frequency segments for the plurality of extracted signal segments is obtained;
processing the plurality of frequency segments with a machine-learning transformer model to yield a global frequency feature of the presented face;
applying a short-time Fourier transform (STFT) to the plurality of extracted signal segments to yield a spectrogram;
processing the spectrogram with a convolutional neural network (CNN) to yield a local frequency feature of the presented face; and
combining the global and local frequency features to yield an enriched feature of the presented face for determining whether the presented face is a genuine face or a spoofer.