US 11,861,459 B2
Automatic determination of suitable hyper-local data sources and features for modeling
Rajendra Rao, Los Gatos, CA (US); Rajesh Phillips, Bangalore (IN); Manisha Sharma Kohli, Subhash Nagar (IN); Puneet Sharma, Bangalore Karnata (IN); and Vijay Ekambaram, Chennai (IN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jun. 11, 2019, as Appl. No. 16/437,477.
Prior Publication US 2020/0394551 A1, Dec. 17, 2020
Int. Cl. G06N 5/022 (2023.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06F 21/62 (2013.01); G06F 16/901 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 16/9024 (2019.01); G06F 21/6209 (2013.01); G06N 5/022 (2013.01); G06N 5/04 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
responsive to training each model of a plurality of models using a software platform:
receiving, from a user associated with the model, client data, a use-case description comprising a user-input natural text description of an intended use-case of the model including an expected type of output from the model, and a selection of local data sources to be used in the model, wherein the local data sources are accessible by the software platform;
generating, based on the client data, a client data profile;
determining, for each feature of a plurality of features associated with the selected local data sources, a feature importance, wherein the feature importance indicates an impact that feature had on each of the plurality of models;
generating, based on the use-case description, a use-case profile comprising a vector embedding encoding the respective user-input natural text description;
generating, based on a plurality of determined client data profiles, a plurality of determined feature importances associated with features associated with local data sources, and a plurality of determined use-case profiles, a feature profile relation graph comprising a plurality of client data profile nodes, a plurality of local feature nodes and a plurality of use-case profile nodes, wherein each local feature node of the plurality of local feature nodes is associated with one or more client data profile nodes and one or more user-case profile nodes by a respective edge having an associated edge weight, wherein a given edge is assigned an edge weight that represents a strength of relationship between the respective local feature node and one of the client data profile nodes and the user-case profile nodes;
responsive to receiving a new client data set and a new use-case description comprising a new user-input natural text description of a new intended use-case, determining, based on the new client data set, the new use-case description and the feature profile relation graph, one or more local features as suggested local features for use in building a new model; and
automatically initiating training of the new model based on the new client data set and the suggested local features.