| CPC G06F 16/35 (2019.01) [G06N 5/04 (2013.01)] | 20 Claims |

|
1. A computer-implemented method for determining features from raw data, the method comprising:
receiving a plurality of data structures of raw data, wherein each data structure of the plurality of data structures comprises a respective plurality of data elements;
determining a data profile for the raw data based on an amount of data elements of the respective plurality of data elements for at least one data structure of the plurality of data structures satisfying a statistical threshold for indicating the data profile; and
for each data structure of the plurality of data structures of the raw data:
determining, based on a semantic rule that describes how to infer a semantic type from a data element of the respective plurality of data elements, the semantic type for each data structure, and
selecting, based on the determined semantic type, an instruction that describes how to calculate an input feature for a machine learning model based on the respective plurality of data elements for each data structure, and
wherein the semantic types for at least a portion of data structures of the raw data are validated based on a determination that the semantic types for at least the portion of data structures correspond to the data profile.
|