US 11,990,152 B2
	Classification of audio as originating from a human source or a non-human to avoid false wake-word detection
Yuvaraj Nagarathnam, Bangalore (IN)
Assigned to ARRIS ENTERPRISES LLC, Horsham, PA (US)
Filed by ARRIS Enterprises LLC, Suwanee, GA (US)
Filed on Oct. 28, 2021, as Appl. No. 17/512,924.
Claims priority of provisional application 63/113,535, filed on Nov. 13, 2020.
Prior Publication US 2022/0157333 A1, May 19, 2022
Int. Cl. G10L 25/78 (2013.01); G06V 40/10 (2022.01); G10L 15/00 (2013.01); H04W 4/02 (2018.01)

CPC G10L 25/78 (2013.01) [G06V 40/10 (2022.01); G10L 15/00 (2013.01); H04W 4/02 (2013.01); G10L 2025/786 (2013.01)]

20 Claims

1. An audio processing device in a wireless network, comprising:

a sound sensor;

a voice source processing system;

a position processing system;

a position estimator;

a memory storing computer-readable instructions; and

a processor configured to execute the computer-readable instructions to:

receive, by the sound sensor, one or more audio signals associated with an audio command from a source;

generate, by the voice source processing system, a voice source data from the one or more audio signals;

receive data from a positioning system connected to the audio processing device;

generate, by the position processing system using the position estimator, position data based on the data from the positioning system, wherein the data is associated with a location of the source;

determine the location based on the voice source data;

divide an area around the location of the source of the one or more audio signals into three-dimensional (3D) block regions;

process the one or more audio signals to calculate a probability of the source at a first of the 3D block regions;

ignore 3D block regions with a probability of presence of the source at the first of the 3D block regions that is less than a probability threshold of the source being at the first of the 3D block regions;

determine that two of the 3D block regions meet three conditions, a first condition is determined to occur when an angle between the two of the 3D block regions is greater than an angle threshold, a second condition is determined to occur when a probability of the source originating from the two of the 3D block regions is greater than a probability threshold of the source being in the two of the 3D block regions, and a third condition is determined to occur when a command signature from the two of the 3D block regions correlate;

determine, based on the three conditions being met, that the one or more audio signals originate from a non-human source; and

discard the audio command.