US 12,079,352 B2
Enforcing data security constraints in a data pipeline
Anton Apostolatos, Boston, MA (US); Adam Lieskovský, Baar (CH); Florian Diegruber, Munich (DE); Francisco Ferreira, Zurich (CH); Joseph Kane, Munich (DE); Joanna Peller, London (GB); Kelvin Lau, Fairfield (AU); Maciej Laska, Warsaw (PL); Mikael Ibrahim Mofarrej, Munich (DE); Max-Philipp Schrader, Germering (DE); Philipp Hoefer, Munich (DE); Spencer McCollester, London (GB); and Viktor Nordling, Watson (AU)
Assigned to Palantir Technologies Inc., Denver, CO (US)
Filed by Palantir Technologies Inc., Palo Alto, CA (US)
Filed on Apr. 8, 2021, as Appl. No. 17/226,014.
Claims priority of application No. 2020155 (GB), filed on Dec. 18, 2020.
Prior Publication US 2022/0198032 A1, Jun. 23, 2022
Int. Cl. G06F 21/60 (2013.01); G06F 16/25 (2019.01)
CPC G06F 21/604 (2013.01) [G06F 16/258 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method for enforcing data security constraints in a data pipeline, wherein the data pipeline takes one or more source datasets as input and performs one or more data transformations on them, the method comprising:
within a first stage of the data pipeline, generate a first transformed dataset by performing a first data transformation on a first subset of entries of the one or more source datasets, wherein the first subset is defined according to one or more first data security constraints, wherein the one or more first data security constraints are associated with one or more columns or rows, and wherein an entry is accepted into or rejected from the first transformed dataset based on a comparison between the entry and the one or more first data security constraints;
within a second stage of the data pipeline, generate a second transformed dataset by performing a second data transformation on a second subset of entries of the one or more source datasets;
validate the second transformed dataset according to a pattern or constraint specified by the first transformed dataset, wherein the validating comprises comparing entries of the second transformed dataset against the first transformed dataset and filtering out any fields of the second transformed dataset that fail to conform to the pattern or constraint specified by the first transformed dataset, the first transformed dataset specifying a previously unknown or undefined criteria; and
providing an alert if any fields of the second transformed dataset fail to conform to the pattern.