US 12,277,133 B2
Automatic data linting rules for ETL pipelines
Ignacio Agustin Manzano, Buenos Aires (AR); Subhash Periasamy, Newark, CA (US); Berkay Polat, Redwood City, CA (US); Anand Nair, Mountain House, CA (US); Udayakumar Dhansingh, Dublin, CA (US); Vijay Gopalakrishnan, Seattle, WA (US); and Saebom Kwon, San Francisco, CA (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on Dec. 13, 2022, as Appl. No. 18/080,268.
Prior Publication US 2024/0193174 A1, Jun. 13, 2024
Int. Cl. G06F 16/20 (2019.01); G06F 16/25 (2019.01)
CPC G06F 16/254 (2019.01) 21 Claims
OG exemplary drawing
 
1. A data linting method, comprising:
generating a first ruleset from a schema corresponding to a data format associated with a data type included in a database of a first database language, wherein the first ruleset is a data type ruleset that includes a rule for each data type definition in the database;
generating a second ruleset from a representative data sample which is representative of the schema, wherein the second ruleset is a data shape ruleset that defines rules regarding a general shape of the data included in the database;
creating a combined ruleset from a graph that specifies a relationship between the schema, the first ruleset, and the second ruleset, wherein the combined ruleset is specified in the first database language;
transforming the combined ruleset into a transformed ruleset, that preserves the relationship between the schema, the first ruleset, and the second ruleset, using conventions consistent with a second database language; and
applying the transformed ruleset to the database to create a transformed database.