| CPC G06F 7/588 (2013.01) | 30 Claims |

|
1. A method for scalable data selection according to an enforced randomization scheme, the method comprising:
receiving a dataset with a plurality of data elements;
generating a plurality of random values;
generating a ranking of the plurality of random values;
selecting a subset of the plurality of data elements based on a comparison between the ranking and a predetermined threshold, wherein the predetermined threshold is based on a predetermined proportion of the plurality of data elements to be included in the subset;
outputting a randomization scheme that includes the ranking and the predetermined proportion, wherein a second selection of a second subset of the plurality of data elements is based on the randomization scheme; and
outputting an indication of the subset and the second subset.
|