US 11,858,651 B2
Machine learning model development with interactive feature construction and selection
Seema Chopra, Bengaluru (IN); Akshata Kishore Moharir, Bengaluru (IN); Arvind Sundararaman, Bengaluru (IN); and Kaustubh Kaluskar, Bangalore (IN)
Assigned to The Boeing Company, Arlington, VA (US)
Filed by The Boeing Company, Chicago, IL (US)
Filed on Oct. 25, 2018, as Appl. No. 16/170,887.
Prior Publication US 2020/0134369 A1, Apr. 30, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. B64D 45/00 (2006.01); B64D 43/02 (2006.01); G06N 20/00 (2019.01); B64F 5/60 (2017.01); G06F 18/40 (2023.01); G06F 18/213 (2023.01)
CPC B64D 45/00 (2013.01) [B64D 43/02 (2013.01); B64D 45/0015 (2013.01); B64F 5/60 (2017.01); G06F 18/213 (2023.01); G06F 18/40 (2023.01); G06N 20/00 (2019.01); B64D 2045/0085 (2013.01)] 24 Claims
OG exemplary drawing
 
1. An apparatus for interactive machine learning model development, the apparatus comprising:
a memory storing a plurality of observations of data of a system, each of the plurality of observations of the data including values of a plurality of independent variables, and a value of a dependent variable; and
processing circuitry configured to access the memory, and execute an application to generate a visual environment including a graphical user interface (GUI) for interactive development of a machine learning model, according to an iterative process at least an iteration of which includes the apparatus being caused to at least:
access the memory including the plurality of observations of the data;
visually present, via the GUI, the plurality of independent variables;
receive user input indicating selection of a set of independent variables from the plurality of independent variables;
intelligently perform imputation on the set of independent variables to add values of independent variables of interest to the set of independent variables and/or intelligently perform cleansing on the set of independent variables to remove values of independent variables not of interest from the set of independent variables;
generate a data quality table that tracks values of independent variables that are added via imputation or removed via cleansing at each iteration of the iterative process;
visually present, via the GUI, infographics visually summarizing and comparing independent variables in the set of independent variables, wherein the infographics include the data quality table;
receive user input indicating selection of a refined set of independent variables from the set of independent variables;
perform an interactive feature construction by transforming the refined set of independent variables into a set of features for use in building the machine learning model to predict the dependent variable; and
build the machine learning model using a machine learning algorithm, the set of features generated from the refined set of independent variables, and a training set produced from the set of features and the plurality of observations of the data,
wherein the interactive development of the machine learning model further includes the apparatus being caused to output the machine learning model for deployment to predict and thereby produce predictions of the dependent variable for additional observations of the data that exclude the value of the dependent variable, the predictions produced by the machine learning model being more accurate than produced by a corresponding machine learning model built without the interactive feature construction and selection that include user input via the GUI.