| CPC G10L 25/93 (2013.01) [G06N 3/048 (2023.01); G06N 3/08 (2013.01); G10L 25/45 (2013.01)] | 20 Claims |

|
1. A computing system, comprising:
one or more processors; and
memory storing instructions that, when executed, cause the one or more processors to:
tag respective portions of an audio signal with ground truth labels for a plurality of audio event classes;
generate a consolidated audio signal by augmenting the audio signal with semi-synthetic audio signals;
divide the consolidated audio signal into a plurality of segments, wherein:
each segment of the plurality of segments overlaps an adjacent segment by an overlap amount; and
each segment of the plurality of segments that is associated with a respective portion of the audio signal that is tagged with a ground truth label retains the ground truth label;
form a training data set by generating a normalized time domain representation of each segment of the plurality of segments; and
train, based on the training data set and for the normalized time domain representation of each segment of the plurality of segments, an artificial intelligence model to predict a classification score for each audio event class of the plurality of audio event classes.
|