US 12,340,178 B2
Generating synthetic user data for training and evaluation of machine learning-based language models
Yashraj Panwar, Mountain View, CA (US)
Filed by Yashraj Panwar, Mountain View, CA (US)
Filed on Nov. 19, 2024, as Appl. No. 18/951,670.
Claims priority of provisional application 63/720,759, filed on Nov. 15, 2024.
Claims priority of provisional application 63/712,512, filed on Oct. 27, 2024.
Prior Publication US 2025/0077782 A1, Mar. 6, 2025
Int. Cl. G06F 17/00 (2019.01); G06F 16/31 (2019.01); G06F 16/332 (2019.01); G06F 16/334 (2025.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)
CPC G06F 40/30 (2020.01) [G06F 16/322 (2019.01); G06F 16/3326 (2019.01); G06F 16/3344 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving a request for generating training dataset for a machine learning-based model, the training dataset comprising synthetic user profiles, each synthetic user profile for a hypothetical user, the synthetic user profile comprising a sequence of epochs, each epoch representing a set of events associated with the hypothetical user that occurred during a time period of the epoch, each epoch associated with a relevance score determined based on the set of events;
determining sets of values of user profile parameters, each set of values comprising parameters describing a sequence of epochs including a trajectory representing variation of relevance scores of epochs of the sequence of epochs over time;
generating a training dataset comprising, for each set of values of user profile parameters:
sending a prompt to a machine learning-based language model, the prompt requesting the machine learning-based language model to generate a synthetic user profile based on the set of values of user profile parameters, and
receiving from the machine learning-based language model, a representation of a synthetic user profile comprising a sequence of epochs having relevance scores varying over time according to the trajectory specified in the set of values; and
training the machine learning-based model using the training dataset.