US 12,436,985 B2
Featuring engineering based on semantic types
Man Chon U, Sunrise, FL (US); Beau Britain, Sunrise, FL (US); Thai Chiun Huang, Palto Alto, CA (US); Yan Yang, Sunrise, FL (US); and Linou Zhu, Palo Alto, CA (US)
Assigned to American Express Travel Related Services Company, Inc., New York, NY (US)
Filed by American Express Travel Services Company, Inc., New York, NY (US)
Filed on Jun. 4, 2024, as Appl. No. 18/733,650.
Application 18/733,650 is a continuation of application No. 17/539,557, filed on Dec. 1, 2021, granted, now 12,001,467, issued on May 15, 2024.
Prior Publication US 2024/0320252 A1, Sep. 26, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 7/00 (2006.01); G06F 16/35 (2019.01); G06N 5/04 (2023.01)
CPC G06F 16/35 (2019.01) [G06N 5/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for determining features from raw data, the method comprising:
receiving a plurality of data structures of raw data, wherein each data structure of the plurality of data structures comprises a respective plurality of data elements;
determining a data profile for the raw data based on an amount of data elements of the respective plurality of data elements for at least one data structure of the plurality of data structures satisfying a statistical threshold for indicating the data profile; and
for each data structure of the plurality of data structures of the raw data:
determining, based on a semantic rule that describes how to infer a semantic type from a data element of the respective plurality of data elements, the semantic type for each data structure, and
selecting, based on the determined semantic type, an instruction that describes how to calculate an input feature for a machine learning model based on the respective plurality of data elements for each data structure, and
wherein the semantic types for at least a portion of data structures of the raw data are validated based on a determination that the semantic types for at least the portion of data structures correspond to the data profile.