CPC G06F 9/5055 (2013.01) [G06F 16/252 (2019.01)] | 20 Claims |
1. A computing device comprising:
one or more processors; and
memory storing instructions that, when executed by the one or more processors, cause the computing device to:
generate a testing database that corresponds to a state of one or more databases managed by a data sharing platform at a point in time, wherein the data sharing platform enables users to access the one or more databases managed by the data sharing platform, wherein the data sharing platform is configured to provide access to the data stored by the data sharing platform via one or more of a plurality of virtual warehouses, wherein each of the plurality of virtual warehouses comprises a respective set of computing resources configured to:
execute one or more queries with respect to at least a portion of a plurality of data warehouses;
collect results from the one or more queries; and
provide access to the collected results;
determine a log of a plurality of different events executed, via one or more of the plurality of virtual warehouses and after the point in time, with respect to the one or more databases;
predict an optimized virtual warehouse configuration for a first virtual warehouse by:
identifying a first subset of the plurality of different events based on determining that the first subset of the plurality of different events were conducted during a first time period;
identifying a second subset of the plurality of different events based on determining that the second subset of the plurality of different events were conducted during a second time period;
measuring first performance parameters of a first warehouse configuration by replaying, via the first virtual warehouse, the first subset of the plurality of different events at the testing database while the first virtual warehouse is configured in accordance with the first warehouse configuration, wherein the plurality of different warehouse configurations each correspond to a different set of computing resources available to the first virtual warehouse, and wherein the replaying the first subset of the plurality of different events comprises replaying a same event of the first subset of the plurality of different events at least twice;
modifying the testing database by rolling the testing database back to the state of the one or more databases managed by the data sharing platform at the point in time;
measuring second performance parameters of a second warehouse configuration by replaying, via the first virtual warehouse, the second subset of the plurality of different events at the testing database while the first virtual warehouse is configured in accordance with the second warehouse configuration, wherein the replaying the second subset of the plurality of different events comprises inserting one or more random delays between the initiation of at least two of the second subset of the plurality of different events; and
selecting the optimized virtual warehouse configuration based on the first performance parameters and the second performance parameters; and
output the optimized virtual warehouse configuration.
|