US 12,461,940 B2
Cross-cloud replication of recurrently executing data pipelines
Istvan Cseri, Seattle, WA (US); Dinesh Chandrakant Kulkarni, Sammamish, WA (US); Mihir Dhananjay Kulkarni, Foster City, CA (US); Lanhao Wu, Seattle, WA (US); and Di Fei Zhang, Redmond, WA (US)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Aug. 31, 2022, as Appl. No. 17/823,752.
Claims priority of provisional application 63/366,084, filed on Jun. 9, 2022.
Prior Publication US 2023/0401232 A1, Dec. 14, 2023
Int. Cl. G06F 16/27 (2019.01); G06F 21/62 (2013.01)
CPC G06F 16/275 (2019.01) [G06F 21/6218 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
detecting, using one or more processors of a server, a resume command entered by a user on a client device to resume execution of a first data pipeline on a first deployment that is hosted on a first cloud service, the first data pipeline comprising a first plurality of computing tasks that are configured for execution on the first cloud service, the resume command indicating an explicit user action that changes to the first data pipeline have been made and are complete and that the first data pipeline is execution-ready;
in response to detecting the resume command for the first data pipeline, detecting a latest committed version of recurrently executed tasks of the first data pipeline and an uncommitted version of the recurrently executed tasks of the first data pipeline that is more recent than the latest committed version;
in response to detecting the resume command for the first data pipeline, accessing metadata of a second data pipeline on a second deployment that is hosted on a second cloud service and providing the metadata to the client device, the first data pipeline and the second data pipeline comprising same computing tasks;
receiving a confirmation from the client device indicating that a version of the second data pipeline is acceptable for resuming execution of the first data pipeline on the second cloud service;
in response to detecting the resume command for the first data pipeline, selecting the latest committed version of recurrently executed tasks of the first data pipeline over the uncommitted version of recurrently executed tasks of the first data pipeline for the resume command;
in response to receiving the confirmation from the client device, replicating the latest committed version of the recurrently executed tasks to the second data pipeline on the second deployment that replicates a database of the first deployment in another geographic location, the second data pipeline comprising a second plurality of computing tasks that are configured for execution on the second cloud service, the first data pipeline corresponding to the first deployment, the second data pipeline corresponding to the second deployment;
detecting that a task from the latest committed version of the recurrently executed tasks cannot be replicated on the second data pipeline based on a first role that owns the task not being replicated on the second data pipeline, the first role providing an operate privilege to a second role on the first data pipeline that allows a second user to resume or suspend the task; and
in response to detecting that the task cannot be replicated on the second data pipeline based on the first role that owns the task not being replicated on the second data pipeline, suspending a root task on the second data pipeline for the first role and the second user.