US 12,265,899 B2
Method for serving parameter efficient NLP models through adaptive architectures
Terrence J. Torres, Mountain View, CA (US); Tharathorn Rimchala, San Francisco, CA (US); and Andrew Mattarella-Micke, Mountain View, CA (US)
Assigned to INTUIT INC., Mountain View, CA (US)
Filed by INTUIT INC., Mountain View, CA (US)
Filed on Jun. 2, 2023, as Appl. No. 18/328,041.
Application 18/328,041 is a continuation of application No. 16/732,869, filed on Jan. 2, 2020, granted, now 11,704,602.
Prior Publication US 2023/0316157 A1, Oct. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 20/20 (2019.01); G06F 40/126 (2020.01); G06F 40/284 (2020.01)
CPC G06N 20/20 (2019.01) [G06F 40/126 (2020.01); G06F 40/284 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving a first input data of a first task type during a runtime, the first task type indicating first processing to be performed by one or more application instances during the same runtime;
dynamically generating a first model tuned to generate predictions for the first task type, the generating comprising integrating, into a previously-trained base model during the same runtime, a first model artifact comprising one or more adapter layers specific to the first task type;
generating, during the same runtime and based on processing the first input data with the first model generated during the same runtime, a prediction for the first input data;
distributing the prediction to the one or more application instances during the same runtime, thereby enabling the first processing;
receiving a second input data of a second task type during the same runtime, the second task type being different from the first task type and indicating second processing to be performed by the one or more application instances during the same runtime;
generating, during the same runtime, a second model tuned to generate predictions for the second task type, the generating comprising dynamically exchanging the first model artifact with a second model artifact comprising one or more adapter layers specific to the second task type;
generating, during the same runtime, a second prediction for the second input data by processing the second input data using the second model; and
distributing the second prediction to the one or more application instances during the same runtime, thereby enabling the second processing.