US 12,450,003 B2
Machine learning facets for dataset preparation in storage devices
Kalapriya Kannan, Karnataka (IN); Chaitra Kallianpur, Andover, MA (US); Bruce Rabe, Andover, MA (US); Suparna Bhattacharya, Karnataka (IN); and Krishnaraju Thangaraju, Karnataka (IN)
Assigned to Hewlett Packard Enterprise Development LP, Spring, TX (US)
Filed by Hewlett Packard Enterprise Development LP, Houston, TX (US)
Filed on Aug. 23, 2022, as Appl. No. 17/821,513.
Prior Publication US 2024/0069787 A1, Feb. 29, 2024
Int. Cl. G06F 3/06 (2006.01)
CPC G06F 3/0655 (2013.01) [G06F 3/0604 (2013.01); G06F 3/0679 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A storage device comprising:
a processing resource; and
a non-transitory machine-readable storage medium comprising instructions executable by the processing resource to:
store machine learning (ML) facet mappings between ML facets and dataset preparation tags in a repository, wherein the ML facets are properties of datasets or ML models for optimizing quality of the datasets;
identify a ML facet of a dataset stored in the storage device;
determine, based on at least one of dataset metrics of the dataset, storage performance metrics of the storage device, and application performance metrics, a first quality score for the dataset, wherein the first quality score indicates an amount of relevant information in the dataset;
identify a dataset preparation tag mapped to the identified ML facet as indicated in the ML facet mappings;
generate a filtered dataset from the dataset based on the dataset preparation tag and determine, based on at least one of dataset metrics of the filtered dataset, the storage performance metrics of the storage device, and the application performance metrics, a second quality score that indicates an amount of relevant information in the filtered dataset; and
in response to a request for the dataset from an ML application and determining that the second quality score is greater than the first quality score, transmit the filtered dataset to the ML application across a bandwidth-limited communication link.