US 11,990,134 B2
Method for configuring and using a numeric-to-alphabetic expression machine learning model
Xiaolong Li, Cary, NC (US); Xiaozhuo Cheng, Cary, NC (US); and Xu Yang, Cary, NC (US)
Assigned to SAS INSTITUTE INC., Cary, NC (US)
Filed by SAS INSTITUTE INC., Cary, NC (US)
Filed on Jul. 11, 2023, as Appl. No. 18/220,632.
Application 18/220,632 is a continuation in part of application No. 18/214,336, filed on Jun. 26, 2023, granted, now 11,922,947.
Application 18/214,336 is a continuation in part of application No. 17/993,385, filed on Nov. 23, 2022, granted, now 11,862,171.
Application 17/993,385 is a continuation in part of application No. 17/851,264, filed on Jun. 28, 2022, granted, now 11,538,481, issued on Dec. 27, 2022.
Application 17/851,264 is a continuation in part of application No. 17/498,811, filed on Oct. 12, 2021, granted, now 11,373,655, issued on Jun. 28, 2022.
Application 17/498,811 is a continuation in part of application No. 17/370,441, filed on Jul. 8, 2021, granted, now 11,404,053, issued on Aug. 2, 2022.
Application 17/370,441 is a continuation of application No. PCT/CN2021/082572, filed on Mar. 24, 2021.
Application 17/498,811 is a continuation in part of application No. 17/205,871, filed on Mar. 18, 2021, granted, now 11,145,309, issued on Oct. 12, 2021.
Application 17/205,871 is a continuation in part of application No. 17/138,521, filed on Dec. 30, 2020, granted, now 11,049,502, issued on Jun. 29, 2021.
Application 17/138,521 is a continuation of application No. 17/138,445, filed on Dec. 30, 2020, granted, now 11,138,979, issued on Oct. 5, 2021.
Claims priority of provisional application 63/451,855, filed on Mar. 13, 2023.
Claims priority of provisional application 63/297,002, filed on Jan. 6, 2022.
Claims priority of provisional application 63/288,385, filed on Dec. 10, 2021.
Claims priority of provisional application 62/991,275, filed on Mar. 18, 2020.
Prior Publication US 2023/0386473 A1, Nov. 30, 2023
Int. Cl. G10L 15/22 (2006.01); G10L 15/02 (2006.01); G10L 15/04 (2013.01); G10L 15/26 (2006.01); G10L 25/30 (2013.01); G10L 25/78 (2013.01)
CPC G10L 15/26 (2013.01) [G10L 15/02 (2013.01); G10L 15/04 (2013.01); G10L 25/30 (2013.01); G10L 25/78 (2013.01); G10L 2025/783 (2013.01)] 26 Claims
OG exemplary drawing
 
1. A computer-program product embodied in a non-transitory machine-readable storage medium storing computer instructions that, when executed by one or more processors, perform operations comprising:
constructing a transcript adaptation training data corpus comprising a plurality of transcript normalization training data samples, wherein each of the plurality of transcript normalization training data samples includes:
a training sample pairing between (i) a predicted audio transcript that includes at least one numerical expression and (ii) an adapted audio transcript that includes an alphabetic representation of the at least one numerical expression;
a transcript normalization identifier that, when applied to a model input comprising a target audio transcript, defines a text-to-text transformation objective causing a numeric-to-alphabetic expression machine learning model to predict an alphabetic-equivalent audio transcript that represents each numerical expression included in the target audio transcript in one or more alphabetic tokens;
configuring the numeric-to-alphabetic expression machine learning model based on a training of a machine learning text-to-text transformer model using the transcript adaptation training data corpus; and
executing the numeric-to-alphabetic expression machine learning model within a speech-to-text post-processing sequence of a speech-to-text service based on the numeric-to-alphabetic expression machine learning model satisfying a minimum audio transcript adaptation efficacy value;
obtaining audio data comprising one or more utterances;
generating, via a speech-to-text machine learning model, a probable audio transcript based on an input of the audio data, wherein the probable audio transcript includes a plurality of numerical expressions;
generating, via the numeric-to-alphabetic expression machine learning model, an adjusted audio transcript of the probable audio transcript based on an input of a task-specific instruction to the numeric-to-alphabetic expression machine learning model, wherein the task-specific instruction includes:
an instructional prefix component comprising the transcript normalization identifier, wherein the numeric-to-alphabetic expression machine learning model identifies a task type of the instructional prefix component, wherein the task type of the instructional prefix component corresponds to the transcript normalization identifier; and
an input text string comprising the probable audio transcript; and
obtaining, from a memory, a set of weights and biases generated from the training of the machine learning text-to-text transformer model that corresponds to the transcript normalization identifier, wherein the executing the numeric-to-alphabetic expression machine learning model includes using the set of weights and biases to generate the adjusted audio transcript.