US 12,282,479 B2
Intelligent parity service with database query optimization
Sandeep Khurana, Bangalore (IN); and Ketan Gunvantrai Popat, Bangalore (IN)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Jan. 31, 2022, as Appl. No. 17/589,869.
Prior Publication US 2023/0244661 A1, Aug. 3, 2023
Int. Cl. G06F 16/2453 (2019.01); G06F 16/14 (2019.01); G06F 16/2455 (2019.01)
CPC G06F 16/24537 (2019.01) [G06F 16/152 (2019.01); G06F 16/24554 (2019.01)] 15 Claims
OG exemplary drawing
 
1. A method for performing a parity check of a table, comprising:
ingesting the table from at least one partition of a database, wherein records of the table are stored in a plurality of partitions of the database, to a data lake;
obtaining, from the data lake, initial data lake records stored in the table during an initial time interval;
obtaining an initial partitioning information including at least one partitioning field, the at least one partitioning field having at least one partition identifier,
wherein the at least one partition identifier is a distinct partition value of the at least one partitioning field,
for partitioning the table in the database during the initial time interval;
extracting, from the initial data lake records of the table of the data lake, an initial plurality of partition identifiers stored in the table during the initial time interval, wherein each partition identifier of the initial plurality of partition identifiers identifies a partition in the database storing the records of the table;
generating a first partition-specific database query comprising a first partition identifier of the initial plurality of partition identifiers;
executing the first partition-specific database query to obtain first database records stored in the table in a first partition of the database during the initial time interval;
extracting a first partition-specific subset of the initial data lake records that include the first partition identifier;
performing a first parity comparison on corresponding checksums of records of the first partition-specific subset of the initial data lake records and records of the first database records to generate a first parity result identifying the records of the first database records having a checksum mismatch with the records of the first partition-specific subset of the initial data lake records;
generating a second partition-specific database query comprising a second partition identifier of the initial plurality of partition identifiers;
executing the second partition-specific database query to obtain second database records stored in the table in a second partition of the database during the initial time interval;
extracting a second partition-specific subset of the initial data lake records that include the second partition identifier;
performing a second parity comparison on corresponding checksums of records of the second partition-specific subset of the initial data lake records and records of the second database records to generate a second parity result identifying the records of the second database records having a checksum mismatch with the records of the second partition-specific subset of the initial data lake records; and
combining the first parity result and the second parity result to generate a combined parity result,
wherein the combined parity result identifies (i) a subset of the first database records stored in the first partition that fail to match a parity of the first partition-specific subset of the initial data lake records, and (ii) a subset of the second database records stored in the second partition that fail to match a parity of the second partition-specific subset of the initial data lake records.