US 11,714,834 B2
Data compression based on co-clustering of multiple parameters for AI training
Ofir Ezrielev, Be'er Sheva (IL); Nadav Azaria, Meitar (IL); Avitan Gefen, Tel Aviv (IL); and Amihai Savir, Newton, MA (US)
Assigned to Dell Products L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Jan. 21, 2022, as Appl. No. 17/581,127.
Application 17/581,127 is a continuation in part of application No. 17/509,759, filed on Oct. 25, 2021.
Prior Publication US 2023/0125308 A1, Apr. 27, 2023
Int. Cl. H03M 7/00 (2006.01); G06F 16/28 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/285 (2019.01) [G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
accessing single-parameter cluster information for each of two or more parameters of interest, wherein each parameter of interest corresponds to a time series of numeric values sent from one or more internet of things (IoT) units to an edge device, wherein the single-parameter cluster information for each parameter of interest indicates a single-parameter cluster count for the parameter of interest;
determining a co-clustering ratio for each pair of the two or more parameters of interest, wherein each pair includes a first parameter and a second parameter and wherein the co-clustering ratio indicates whether the number of clusters produced by a co-clustering algorithm applied to the first and second parameters is less than the product of the single-parameter cluster counts for the first and second parameters;
identifying one or more co-cluster groups based on the cluster ratios, wherein each co-cluster group includes two or more of the parameters of interest;
for each of the one or more co-cluster groups, employing the co-clustering algorithm to produce compressed co-clustered encodings of the tuples; and
transmitting the compressed co-clustered encodings of the tuples to a cloud computing resource;
responsive to receiving, by a decoder, the compressed co-clustered encodings, generating surrogates for the tuples in accordance with a probability distribution applicable to the particular parameter pair; and
providing the surrogates as training data for an artificial intelligence engine of the cloud computing resource.