| CPC G06F 21/6254 (2013.01) [G06F 18/22 (2023.01); G06F 21/602 (2013.01)] | 15 Claims |

|
1. A computer-implemented method of preparing an anonymised dataset for use in data analytics, the method including the steps of:
(a) labelling elements of a dataset to be analysed according to a labelling scheme;
(b) selecting a subsample from the dataset and deriving therefrom an accuracy threshold indicative of the distance between elements of data within the subsample;
(c) deriving, from the anonymised dataset, an estimated accuracy of distance measurement between elements of the anonymised dataset and comparing this estimated accuracy to the accuracy threshold;
(d) selecting one or more labelled elements of the dataset to be replaced with a distance preserving hash; and for each selected element:
(e) partitioning a data plane including the selected element into a plurality of channels, each channel covering a different distance space of the data plane;
(f) hashing, using a cryptographic hash, data associated with the channel of the data plane in which the selected element resides, to form the distance preserving hash; and
(g) replacing the selected element with the distance preserving hash.
|