CPC G06F 16/283 (2019.01) [G06F 16/212 (2019.01); G06F 16/2452 (2019.01); G06F 16/248 (2019.01); G06F 9/54 (2013.01)] | 20 Claims |
1. A system configured to provide access to information in a relational database via Application Programming Interface (API) operations for dataframes, the system comprising:
electronic storage configured to electronically store information, wherein the stored information represents an input dataframe, wherein the input dataframe includes a two-dimensional, ordered, table of dataframe table positions, wherein individual ones of the table positions contain dataframe values, wherein the two dimensions include a first dimension of columns and a second dimension of rows, wherein the input dataframe further includes one or more sets of row labels for the rows, and a set of column labels for the columns, and wherein the rows are ordered; and
one or more processors configured by machine-readable instructions to:
generate a first relation that represents the input dataframe, the first relation having a first schema that defines a set of attributes and a corresponding set of attribute types, wherein attribute values of individual ones of the set of attributes have a corresponding attribute type from the corresponding set of attribute types, wherein the first relation includes an unordered set of records having the set of attributes, wherein individual records correspond to individual rows of the input dataframe such that the attribute values within the individual records are determined from the dataframe values contained in corresponding rows of the input dataframe;
obtain a dataframe query to be performed on the input dataframe, wherein the dataframe query is in accordance with an Application Programming Interface (API) that provides data analysis modalities for dataframes;
translate the dataframe query into a sequence of relational database operations, such translation including:
(i) a determination of the sequence of relational database operations based on the dataframe query and further based on the attribute values of the first relation so the sequence of relational database operations is configured to generate output corresponding to prospective output of performance of the dataframe query on the input dataframe,
(ii) a determination of a second schema of a second relation, wherein the determination is based on the dataframe query and on the attribute values of the first relation, wherein the second schema defines a second set of attributes and a second corresponding set of attribute types, wherein attribute values of individual ones of the second set of attributes have a corresponding attribute type from the second corresponding set of attribute types, and
(iii) a determination of one or more relational database operations that populate individual ones of records of the second relation;
perform the sequence of relational database operations on the first relation to generate the second relation having the second schema, and wherein the second relation includes the individual ones of the records with the attribute values of the individual ones of the second set of attributes as populated;
present at least a portion of the second relation to a user.
|