US 12,487,986 B2
Generating information integrity instructions using a generative model
Anant Agarwal, Sunnyvale, CA (US); and Sebastián Soto, Emeryville, CA (US)
Assigned to Maplebear Inc., San Francisco, CA (US)
Filed by Maplebear Inc., San Francisco, CA (US)
Filed on Oct. 19, 2023, as Appl. No. 18/490,216.
Prior Publication US 2025/0130986 A1, Apr. 24, 2025
Int. Cl. G06F 16/23 (2019.01)
CPC G06F 16/2365 (2019.01) 17 Claims
OG exemplary drawing
 
1. A method, performed at a computer system comprising a processor and a non-transitory computer readable medium, comprising:
accessing a data repository that stores:
historical data integrity instructions previously used by a data management system, wherein the historical data integrity instructions comprise code that defines historical data integrity checks,
metadata about the historical data integrity instructions, wherein the metadata includes one or more parameters specified by a user for generation of the historical data integrity instructions, and
a log file including a plurality of pull requests, wherein each pull request identifies modifications between a version of the historical data integrity instructions and a prior version of the historical data integrity instructions;
receiving, via a user interface, a request to generate:
new data integrity instructions for a set of data maintained by the data management system, wherein the request includes parameters for the new data integrity instructions;
comparing a supplemental example to the parameters for the new data integrity instructions, wherein the supplemental example identifies historical data integrity instructions;
selecting the supplemental example based on the comparing of the supplemental example to the parameters for the new data integrity instructions;
accessing pull requests and corresponding metadata of the identified historical data integrity instructions;
tuning a large language model on the identified historical data integrity instructions, accessed pull requests, and accessed corresponding metadata, wherein the tuning configures the large language model to output data integrity instructions tailored for data identified in an input to the large language model based on the identified historical data integrity instructions, accessed pull requests, and accessed corresponding metadata, wherein the outputted data integrity instructions, when executed by a processor, cause the processor to perform a data integrity check of the data identified in the input;
generating a prompt to generate the new data integrity instructions for the set of data;
applying the large language model to the generated prompt to a large language model
receiving, from the large language model in response to providing the prompt thereto, the requested new data integrity instructions; and
performing a data integrity check by executing the requested new data integrity instructions; and
in response to performing the data integrity check, presenting, at the user interface, an indication of data validity or invalidity.