CPC G10L 17/04 (2013.01) [G06N 20/00 (2019.01); G10L 15/01 (2013.01); G10L 15/1822 (2013.01); G10L 15/19 (2013.01); G10L 15/26 (2013.01)] | 21 Claims |
1. A method comprising:
establishing a conference session with a plurality of participant user devices;
receiving, via the conference session, a digitized audio signal from a participant user device of the plurality of participant user devices;
establishing a user account identity associated with the participant user device;
determining reference speech mannerism features using a plurality of speech classifiers configured to evaluate multiple distinct features of speech mannerisms extracted from of one or more digitized audio signals generated by a particular individual associated with the user account identity;
converting the digitized audio signal to text;
generating, based on the text, observed speech mannerism features that are exhibited by the digitized audio signal using the plurality of speech classifiers;
determining a similarity measure between the reference speech mannerism features and the observed speech mannerism features based on a number of instances and a frequency of occurrence of respective features of the multiple distinct features in the observed speech mannerism features compared to the reference speech mannerism features;
validating an integrity of the digitized audio signal based on the similarity measure; and
selectively maintaining the participant user device in the conference session based on the validating,
wherein determining the similarity measure includes at least:
identifying a first feature of the multiple distinct features evaluated by the plurality of speech classifiers; and
applying a mismatch frequency weight to the similarity measure as a scaling factor when the frequency of the first feature in the observed speech mannerism features does not correspond with a frequency of the first feature in the reference speech mannerism features associated with the user account identity.
|