US 12,260,003 B1
Clean room generation for data collaboration and executing clean room task in data processing pipeline
William Chau, Redwood City, CA (US); Abhijit Chakankar, San Jose, CA (US); Stephen Michael Mahoney, Portland, OR (US); Daniel Seth Morris, Cranford, NJ (US); and Itai Shlomo Weiss, River Vale, NJ (US)
Assigned to Databricks, Inc., San Francisco, CA (US)
Filed by Databricks, Inc., San Francisco, CA (US)
Filed on Sep. 26, 2023, as Appl. No. 18/474,708.
Application 18/474,708 is a continuation of application No. 18/473,992, filed on Sep. 25, 2023.
Int. Cl. G06F 21/00 (2013.01); G06F 21/62 (2013.01)
CPC G06F 21/6281 (2013.01) 20 Claims
OG exemplary drawing
 
1. A method comprising, at a computer system comprising a processor and a computer-readable medium:
receiving, by a data processing service, an indication to generate a data processing job from a client device of a first user, the data processing job defined with respect to a set of tasks that includes at least one cleanroom task executed in a cleanroom station and at least one non-cleanroom task executed in an execution environment of the first user, each task configured to read one or more input datasets and transform the one or more input datasets into one or more output datasets;
processing, by the data processing service, a first non-cleanroom task in a first execution environment of the first user within a first virtual private cloud (VPC);
obtaining, by the data processing service, a first output from the first non-cleanroom task in the first execution environment of the first user;
providing, by the data processing service, the first output of the first non-cleanroom task into the cleanroom station;
processing, by the data processing service, a cleanroom task using the first output and at least one of a notebook or data table shared into the cleanroom station by at least a second user to generate a second output of the data processing job, wherein the cleanroom task is processed in the cleanroom station that is managed by the data processing service and is separate and isolated from the first execution environment of the first user and a second execution environment of the second user, wherein the cleanroom station is within a second VPC different from the first VPC;
obtaining, by the data processing service, the second output from the cleanroom task executed in the cleanroom station;
providing, by the data processing service, the second output into the first execution environment of the first user; and
processing, by the data processing service, a second non-cleanroom task in the first execution environment of the first user using the second output to generate a third output of the data processing job.