US 12,243,624 B2
Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions
Paul Grouchy, Toronto (CA); Timothy Burton, Ottowa (CA); Ali Khosousi, Toronto (CA); Abhinav Doomra, North York (CA); and Sunny Gupta, Toronto (CA)
Assigned to Analytics For Life Inc., Toronto (CA)
Filed by Analytics For Life Inc., Toronto (CA)
Filed on Jun. 25, 2021, as Appl. No. 17/359,145.
Application 17/359,145 is a continuation of application No. 15/653,433, filed on Jul. 18, 2017, granted, now 11,139,048.
Prior Publication US 2022/0093216 A1, Mar. 24, 2022
Int. Cl. G06N 99/00 (2019.01); G06F 16/56 (2019.01); G06K 9/00 (2022.01); G06N 3/04 (2023.01); G06N 3/086 (2023.01); G06N 20/00 (2019.01); G16B 40/00 (2019.01); G16B 40/20 (2019.01); G16H 50/20 (2018.01); G06F 17/00 (2019.01); G16Z 99/00 (2019.01)
CPC G16B 40/20 (2019.02) [G06N 3/04 (2013.01); G06N 3/086 (2013.01); G06N 20/00 (2019.01); G16B 40/00 (2019.02); G06F 17/00 (2013.01); G16Z 99/00 (2019.02)] 17 Claims
OG exemplary drawing
 
1. A method, performed by a computing system having at least one processor and at least one memory, for discovering features for use in a trained machine learning model for diagnosing medical conditions, the method comprising:
for each of a plurality of feature generators,
for each of a plurality of sets of data signals,
extracting values from a particular set of data signals,
transforming the particular set of data signals by using the extracted values to generate a set of normalized values,
and
applying a particular feature generator to the set of normalized values to produce a feature value, and
generating a set of feature vectors based on the produced feature values;
for each of a plurality of the generated feature vectors, calculating a novelty score;
identifying one or more feature generators from among the plurality of feature generators whose first calculated novelty score exceeds a novelty threshold;
generating a mutated one or more feature generators, comprising applying at least one of a point mutation, random recombination, sub-tree mutation, or a combination thereof to the one or more feature generators; and
using the mutated one or more feature generators, performing operations comprising:
generating additional training data comprising the mutated one or more feature generators;
processing the additional training data to discard the mutated one or more feature generators where a second novelty score of the mutated one or more feature generators is under the novelty threshold;
adding, to a machine learning pipeline, the processed additional training data; and
causing at least one trained machine learning model in the machine learning pipeline to be incrementally retrained using the processed additional training data.