US 11,736,610 B2
	Audio-based machine learning frameworks utilizing similarity determination machine learning models
Gregory Buckley, Dublin (IE); Damian Kelly, Kildare (IE); Mariah Sonja Pereira Penha, Dublin (IE); Jack Sullivan, Dublin (IE); and Bruno Ohana, Dublin (IE)
Assigned to Optum, Inc., Minnetonka, MN (US)
Filed by OPTUM, INC., Minnetonka, MN (US)
Filed on Sep. 28, 2021, as Appl. No. 17/449,223.
Prior Publication US 2023/0094583 A1, Mar. 30, 2023
Int. Cl. H04M 3/51 (2006.01); G06N 20/20 (2019.01); G10L 15/06 (2013.01); G10L 15/26 (2006.01); G06F 18/22 (2023.01)

CPC H04M 3/5183 (2013.01) [G06F 18/22 (2023.01); G06N 20/20 (2019.01); G10L 15/063 (2013.01); G10L 15/26 (2013.01)]

20 Claims

1. A computer-implemented method for generating a predictive output with respect to a primary audio data embedding data object associated with a primary audio data object, the computer-implemented method comprising:

generating, by one or more processors, by utilizing a similarity determination machine learning model, and based at least in part on the primary audio data embedding data object, the predictive output for the primary audio data embedding data object, wherein:

(i) the primary audio data object is associated with an event sequence,

(ii) an audio processing machine learning model is configured to process the primary audio data object to generate a primary audio-based feature set and a primary transcription output data object for the primary audio data object,

(iii) the similarity determination machine learning model comprises an audio embedding sub-model configured to process the primary audio-based feature set and the primary transcription output data object to generate a primary audio data embedding data object for the primary audio data object,

(iv) the similarity determination machine learning model is configured to process the primary audio data embedding data object and a plurality of secondary audio data embedding data objects to identify a similar subset from the plurality of secondary audio data embedding data objects that each satisfy an above-threshold predictive similarity measure in relation to the primary audio data embedding data object, and

(v) the predictive output is determined based at least in part on the similar subset of the plurality of secondary audio data embedding data objects;

generating, by the one or more processors, a forwarding recommendation prediction based at least in part on the predictive output; and

performing, by the one or more processors, one or more prediction-based actions based at least in part on the forwarding recommendation prediction.