US 12,468,983 B2
Machine Learning (ML) model pipeline with obfuscation to protect sensitive data therein
Feng Xu, Sunnyvale, CA (US); Haochong Shen, San Jose, CA (US); Yen-Fen Hsu, Sunnyvale, CA (US); and Sudhir Kumar, Pune (IN)
Assigned to THALES DIS CPL USA, INC., Austin, TX (US)
Filed by THALES DIS CPL USA, INC., Austin, TX (US)
Filed on Oct. 6, 2022, as Appl. No. 17/960,891.
Prior Publication US 2024/0119170 A1, Apr. 11, 2024
Int. Cl. G06N 20/00 (2019.01); G06F 21/62 (2013.01); G06F 40/295 (2020.01); G06N 3/08 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 21/6245 (2013.01); G06F 40/295 (2020.01); G06N 3/08 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A system for Machine Learning (ML) based Data Discovery and Classification (DDC), the system comprising components of:
a user console (100), running on an endpoint machine at a branch location managing and accessing data in a security zone under a security policy, said data in part is private or sensitive, for
processing user requests for data discovery and classification (DDC) on the endpoint machine;
ingesting user requests into a ML pipeline for embedding, training and deploying ML models on said data produced via DDC;
displaying classified data category and identified sensitive entities of said data on the endpoint machine by way of the ML pipeline;
a ML agent (200), communicatively coupled to the user console, also running on the endpoint machine and residing at the branch location, for
polling said user requests by way of the ML pipeline;
scanning the endpoint machine for said data responsive to user requests,
embedding said data to produce an embedding vector that is ingested into the ML pipeline instead of clear data;
applying ML models to the data scanned;
a ML data engine (300), communicatively coupled to the user console and the ML agent, and not residing at the branch location with said security zone, for
handling user requests from user console and ML agent on the ML pipeline; receiving said embedding vector on the ML pipeline;
labeling said embedding vector with labels responsive to user annotations; persisting, training, updating, and applying ML models for use by the ML agent,
wherein said components execute on a computational device comprising one or more processors and memory coupled to the one or more processors, wherein the memory includes computer instructions which when executed by the one or more processors causes the one or more processors to perform said operations.