CPC G06F 16/2264 (2019.01) [G06F 16/2272 (2019.01); G06F 18/22 (2023.01)] | 20 Claims |
1. A method of indexing a data corpus to a set of multidimensional points, the method comprising:
generating a set of points comprising a Sobol sequence in a multidimensional space;
identifying, for each sample in a plurality of samples in a data corpus, a nearest point in the set of points;
generating an index mapping each sample with the nearest point in the Sobol sequence;
receiving a request for a number of samples from the data corpus;
selecting a subset of points from the Sobol sequence, wherein the subset of points includes a number of points equal to the number of samples, and wherein the subset of points are sequential from a beginning of the Sobol sequence;
providing, in response to the request and based on mappings in the index to the subset of points, a subset of the plurality of samples corresponding to the subset of points; and
generating one or more models by training the one or more models using the subset of the plurality of samples.
|