CPC G10L 25/78 (2013.01) [G06V 40/10 (2022.01); G10L 15/00 (2013.01); H04W 4/02 (2013.01); G10L 2025/786 (2013.01)] | 20 Claims |
1. An audio processing device in a wireless network, comprising:
a sound sensor;
a voice source processing system;
a position processing system;
a position estimator;
a memory storing computer-readable instructions; and
a processor configured to execute the computer-readable instructions to:
receive, by the sound sensor, one or more audio signals associated with an audio command from a source;
generate, by the voice source processing system, a voice source data from the one or more audio signals;
receive data from a positioning system connected to the audio processing device;
generate, by the position processing system using the position estimator, position data based on the data from the positioning system, wherein the data is associated with a location of the source;
determine the location based on the voice source data;
divide an area around the location of the source of the one or more audio signals into three-dimensional (3D) block regions;
process the one or more audio signals to calculate a probability of the source at a first of the 3D block regions;
ignore 3D block regions with a probability of presence of the source at the first of the 3D block regions that is less than a probability threshold of the source being at the first of the 3D block regions;
determine that two of the 3D block regions meet three conditions, a first condition is determined to occur when an angle between the two of the 3D block regions is greater than an angle threshold, a second condition is determined to occur when a probability of the source originating from the two of the 3D block regions is greater than a probability threshold of the source being in the two of the 3D block regions, and a third condition is determined to occur when a command signature from the two of the 3D block regions correlate;
determine, based on the three conditions being met, that the one or more audio signals originate from a non-human source; and
discard the audio command.
|