US 11,868,856 B2
Systems and methods for topological data analysis using nearest neighbors
Ajithkumar Warrier, Fremont, CA (US); Jennifer Kloke, Austin, TX (US); Ryan Hsu, San Francisco, CA (US); and Sudhakar Jonnalagadda, Mountain View, CA (US)
Assigned to SymphonyAI Sensa LLC, Palo Alto, CA (US)
Filed by Ayasdi AI LLC, Redwood City, CA (US)
Filed on Feb. 4, 2022, as Appl. No. 17/650,072.
Application 17/650,072 is a continuation of application No. 16/022,607, filed on Jun. 28, 2018, granted, now 11,244,765.
Claims priority of provisional application 62/526,279, filed on Jun. 28, 2017.
Prior Publication US 2022/0199263 A1, Jun. 23, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 16/901 (2019.01); G16H 50/20 (2018.01); G16H 50/70 (2018.01); G06F 18/2323 (2023.01); G06F 18/2413 (2023.01); G06V 10/764 (2022.01)
CPC G06N 20/00 (2019.01) [G06F 16/9024 (2019.01); G06F 18/2323 (2023.01); G06F 18/24147 (2023.01); G06V 10/764 (2022.01); G16H 50/20 (2018.01); G16H 50/70 (2018.01)] 18 Claims
OG exemplary drawing
 
1. A non-transitory computer readable medium including executable instructions, the instructions being executable by a processor to perform a method, the method comprising:
receiving initial data points, the initial data points including rows and columns, each row defining a data point of an initial data set and each column defining a feature, the initial data set including an initial number of columns, each column including values associated with a feature of a plurality of features;
selecting a subset of the data points to create a set of selected data points, the selection being based on each open set within a cover of a set of open sets within the cover, whereby a proportional number of data points relative to all data points that are members of that particular open set within the cover are selected to be members of the set of selected data points;
for each selected data point of the set of selected data points, determining a predetermined number of other data points of the set of selected data points that are closest in distance to that particular selected data point, the distance being determined based on a metric function between a vector of each data point;
grouping the selected data points into a plurality of groups based, at least in part, on the predetermined number of other data points of the set of selected data points that are closest in distance, each group of the plurality of groups including a different subset of data points; and
applying selected data points and the plurality of groups as training data to create a machine learning module, the selected data points and the plurality of groups maintaining shape and relationships within the initial data points, the selected data points being less than the initial data points.