US 12,094,468 B2
Speech detection method, prediction model training method, apparatus, device, and medium
Yi Gao, Shanghai (CN); Weiran Nie, Shanghai (CN); and Youjia Huang, Shenzhen (CN)
Assigned to HUAWEI TECHNOLOGIES CO., LTD., Shenzhen (CN)
Filed by Huawei Technologies Co., Ltd., Shenzhen (CN)
Filed on Jun. 13, 2022, as Appl. No. 17/838,500.
Application 17/838,500 is a continuation of application No. PCT/CN2019/125121, filed on Dec. 13, 2019.
Prior Publication US 2022/0310095 A1, Sep. 29, 2022
Int. Cl. G10L 15/25 (2013.01); G06V 10/82 (2022.01); G06V 40/16 (2022.01); G10L 15/05 (2013.01); G10L 15/26 (2006.01)
CPC G10L 15/25 (2013.01) [G06V 10/82 (2022.01); G06V 40/172 (2022.01); G10L 15/05 (2013.01); G10L 15/26 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
obtaining an audio signal and a face image, wherein a first photographing time point of the face image is the same as a first collection time point of the audio signal;
inputting the face image into a prediction model to predict whether a user intends to continue speaking;
processing the face image using the prediction model to obtain a prediction result;
outputting the prediction result; and
determining that the audio signal is a speech end point when the prediction result indicates that the user does not intend to continue speaking.