| CPC G06F 16/2219 (2019.01) [G06F 16/27 (2019.01); G06F 16/909 (2019.01)] | 20 Claims |

|
1. A system, comprising:
one or more computing devices;
wherein the one or more computing devices include instructions that upon execution on or across the one or more computing devices:
determine, based at least in part on input received via one or more programmatic interfaces of a cloud computing environment whose resources are distributed among a plurality of data centers in respective geographical regions, a first constraint on a location at which a first portion of a data set can be stored, wherein the first constraint is compliant with a first legal requirement applicable to the first portion of the data set, and wherein the data set is to be used as input for one or more analysis operations performed using computing resources of the cloud computing environment;
store the first portion of the data set at a first set of persistent storage resources selected in accordance with the first constraint, wherein the first set of persistent storage resources is located at a first data center of the plurality of data centers;
store location metadata pertaining to a plurality of portions of the data set at the cloud computing environment, wherein the location metadata indicates that
a second portion of the data set is stored at a second set of persistent storage devices at a second data center of the plurality of data centers, wherein the first legal requirement is not applicable to the second portion of the data set;
determine a second constraint on a location at which computations of a particular analysis operation of the one or more analysis operations can be performed;
perform the computations of the particular analysis operation at the first data center in accordance with the second constraint, wherein to perform the computations of the particular analysis operation comprises to:
obtain a replica of the second portion of the data set from the second data center to the first data center; and
use at least the first portion and the replica of the second portion of the data set as input, at a set of computing resources of the cloud computing environment at the first data center, wherein the set of computing resources is selected in accordance with the second constraint; and
provide, via the one or more programmatic interfaces, one or more audit records indicating (a) a location at which the first portion of the data set was stored and (b) a location at which the computations of the particular analysis operation were performed.
|