| CPC G06F 3/0655 (2013.01) [G06F 3/0604 (2013.01); G06F 3/0679 (2013.01)] | 19 Claims |

|
1. A storage device comprising:
a processing resource; and
a non-transitory machine-readable storage medium comprising instructions executable by the processing resource to:
store machine learning (ML) facet mappings between ML facets and dataset preparation tags in a repository, wherein the ML facets are properties of datasets or ML models for optimizing quality of the datasets;
identify a ML facet of a dataset stored in the storage device;
determine, based on at least one of dataset metrics of the dataset, storage performance metrics of the storage device, and application performance metrics, a first quality score for the dataset, wherein the first quality score indicates an amount of relevant information in the dataset;
identify a dataset preparation tag mapped to the identified ML facet as indicated in the ML facet mappings;
generate a filtered dataset from the dataset based on the dataset preparation tag and determine, based on at least one of dataset metrics of the filtered dataset, the storage performance metrics of the storage device, and the application performance metrics, a second quality score that indicates an amount of relevant information in the filtered dataset; and
in response to a request for the dataset from an ML application and determining that the second quality score is greater than the first quality score, transmit the filtered dataset to the ML application across a bandwidth-limited communication link.
|