US 11,699,442 B2
Methods and systems for speech detection
Patricia Scanlon, Dublin (IE)
Assigned to SoapBox Labs Ltd., Dublin (IE)
Filed by SoapBox Labs Ltd., Dublin (IE)
Filed on Oct. 25, 2021, as Appl. No. 17/510,310.
Application 17/510,310 is a continuation of application No. 16/852,376, filed on Apr. 17, 2020, granted, now 11,158,320.
Application 16/852,376 is a continuation of application No. PCT/EP2018/078469, filed on Oct. 18, 2018.
Claims priority of application No. 17197186 (EP), filed on Oct. 18, 2017.
Prior Publication US 2022/0189483 A1, Jun. 16, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/24 (2013.01); G06F 3/16 (2006.01); G06F 21/32 (2013.01); G06F 3/01 (2006.01); G10L 15/22 (2006.01); G06V 40/16 (2022.01)
CPC G10L 15/24 (2013.01) [G06F 3/012 (2013.01); G06F 3/013 (2013.01); G06F 3/167 (2013.01); G06F 21/32 (2013.01); G06V 40/161 (2022.01); G10L 15/22 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method of processing user input to a computing system having an audio input and a visual input, the method comprising:
receiving, at the computing system, an audio input signal from the audio input;
performing a determination of whether a user has demonstrated an intent to interact with the computing system via the audio input, wherein performing the determination of whether the user has demonstrated the intent to interact with the computing system via the audio input comprises:
determining whether a face has been detected using the visual input; and
responsive to the determination that the user has demonstrated the intent to interact with the computing system via the audio input, confirming whether the determination that the user has demonstrated the intent to interact with the computing system via the audio input is reliable by (i) performing additional verification operations comprising two or more of matching the face against a user profile of the user, determining whether the face is detected at an expected distance from a camera, or determining whether the face is detected at an expected angle with respect to the camera, and (ii) determining whether a weighted combination of results of the additional verification operations satisfies a threshold; and
responsive to confirming that the determination that the user has demonstrated the intent to interact with the computing system via the audio input is reliable:
recording the audio input signal from the audio input.