US 11,907,188 B2
Method, device, and program product for managing data pattern
Weilan Pu, Chengdu (CN); Jian Kang, Chengdu (CN); Chi Chen, Chengdu (CN); and Wen Chen, Sichuan (CN)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Apr. 26, 2021, as Appl. No. 17/239,950.
Claims priority of application No. 202110075083.0 (CN), filed on Jan. 20, 2021.
Prior Publication US 2022/0229823 A1, Jul. 21, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01)
CPC G06F 16/215 (2019.01) 11 Claims
OG exemplary drawing
 
1. A method for managing data patterns, including:
acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein the multiple collection devices are located in an edge network in an application environment, and wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices;
generating, based on the multiple sets of data patterns, multiple pattern features, wherein each one of the pattern features is generated for a respective set of data patterns in the multiple sets of data patterns, and wherein each pattern feature includes a number of occurrences of each individual data pattern in the respective set of data patterns;
dividing the multiple collection devices into multiple groups based on the pattern features;
determining, based on the numbers of occurrences of data patterns included in sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group;
distributing the set of shared data patterns to an edge computing device in the edge network, wherein the edge computing device is connected to a target collection device in the multiple collection devices in the group;
instructing the edge computing device to generate de-duplicated data of target data from the target collection device based on the set of shared data patterns, wherein the de-duplicated data is smaller than the target data;
instructing the edge computing device to transmit the de-duplicated data to a server device that is used to process the target data; and
whereby transmission of the de-duplicated data to the server device reduces overhead of storage resources involved in data storage by the server device.