US 12,437,113 B1
Data processing orchestrator utilizing semantic type inference and privacy preservation
Harrison Dahme, Stateline, NV (US); Chris Martin, Seattle, WA (US); Nicholas Roberts-Huntley, New York, NY (US); and Jason Johnson, Ashland, OR (US)
Assigned to K2 NETWORK LABS, INC., Dover, DE (US)
Filed by K2 NETWORK LABS, INC., Dover, DE (US)
Filed on May 10, 2025, as Appl. No. 19/204,523.
Int. Cl. G06F 21/62 (2013.01); G06F 21/60 (2013.01); G06N 7/01 (2023.01)
CPC G06F 21/6254 (2013.01) [G06F 21/602 (2013.01); G06N 7/01 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method for orchestrating automated data processing and transformation, comprising:
receiving, by a centralized orchestrator, a request to process a dataset associated with a client;
initiating, by the orchestrator, a data ingestion process to obtain sample data from the dataset, wherein the data ingestion process is executed on client premises or in a cloud environment;
analyzing, by a semantic analysis module controlled by the orchestrator, the sample data to determine semantic types of data fields within the dataset;
generating, by a transformation module controlled by the orchestrator, data transformation instructions based on the determined semantic types;
deploying, by the orchestrator, a data processing pipeline to a client-controlled environment, wherein the pipeline is executed within the client premises;
configuring, by the orchestrator, privacy preservation parameters for the data processing pipeline to identify and obfuscate potential personally identifiable information (PII) in the dataset;
instructing the data processing pipeline to apply the data transformation instructions and privacy preservation parameters to the dataset;
determining, by a configuration module controlled by the orchestrator, data storage configurations for the transformed dataset;
directing the storage of the transformed dataset according to the determined data storage configurations, wherein the storage occurs in a client-controlled environment or a cloud environment;
generating, by a machine learning module that is executed in the cloud or on client premises, a machine learning model based on the transformed dataset; and
storing the machine learning model in a model repository accessible to the client.