US 12,423,530 B2
Training data generation to facilitate fine-tuning embedding models
Umanga Bista, Southbank (AU); Vladislav Blinov, Melbourne (AU); Mark Edward Johnson, Sydney (AU); Ahmed Ataallah Ataallah Abobakr, Geelong (AU); Thanh Long Duong, Seabrook (AU); Srinivasa Phani Kumar Gadde, Fremont, CA (US); Vishal Vishnoi, Redwood City, CA (US); Elias Luqman Jalaluddin, Seattle, WA (US); Xin Xu, San Jose, CA (US); and Shivashankar Subramanian, Melbourne (AU)
Assigned to Oracle International Corporation, Redwood Shores (CA)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on May 9, 2023, as Appl. No. 18/314,509.
Claims priority of provisional application 63/342,959, filed on May 17, 2022.
Prior Publication US 2023/0376700 A1, Nov. 23, 2023
Int. Cl. G06F 40/58 (2020.01); G06F 40/35 (2020.01); G10L 15/22 (2006.01); H04L 51/02 (2022.01); G06F 40/205 (2020.01); G06F 40/263 (2020.01)
CPC G06F 40/58 (2020.01) [G06F 40/35 (2020.01); G10L 15/22 (2013.01); H04L 51/02 (2013.01); G06F 40/205 (2020.01); G06F 40/263 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, the method comprising:
accessing training data, the training data comprising a plurality of anchor utterances;
generating positive utterances from anchor utterances, wherein generating the positive utterances comprises generating a corresponding group of positive utterances for each respective anchor utterance of the plurality of anchor utterances, wherein each positive utterance of the positive utterances is generated using: (i) a translation operation, (ii) one or more perturbation operations, (iii) one or more augmentation operations, or (iv) any combination thereof, and wherein the positive utterances are semantically similar to the anchor utterances;
generating negative utterances from the anchor utterances using: (i) one or more augmentation operations, (ii) one or more sampling operations, or (iii) any combination thereof, and wherein negative utterances are semantically dissimilar to the anchor utterances;
forming a set of tuples, each tuple of the set of tuples comprising an anchor utterance, a positive utterance selected from the group of positive utterances generated for the anchor utterance, and one or more negative utterances selected from the negative utterances, wherein a positive utterance in a respective tuple is different than positive utterances in other tuples of the tuples;
generating a plurality of embeddings for each tuple of the set of tuples using a pre-trained embedding model, wherein, for a respective tuple of the set of tuples, an embedding of the plurality of embeddings is generated for the anchor utterance, the positive utterance, and the one or more negative utterances of the respective tuple;
generating a fine-tuned model by fine-tuning the pre-trained embedding model, wherein fine-tuning the pre-trained embedding model comprises applying a loss function to the plurality of embeddings for each tuple of the set of tuples and minimizing the loss function for each tuple of the set of tuples;
using the fine-tuned model to generate an embedding for an input utterance;
recognizing an intent of the input utterance based on the embedding; and
outputting results of a skill performed by a system in which the fine-tuned model is deployed based on the intent.