US 12,339,980 B2
Data replacement apparatus, data replacement method, and program
Satoshi Hasegawa, Musashino (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 17/431,719
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Feb. 20, 2020, PCT No. PCT/JP2020/006710
§ 371(c)(1), (2) Date Aug. 18, 2021,
PCT Pub. No. WO2020/184126, PCT Pub. Date Sep. 17, 2020.
Claims priority of application No. 2019-043663 (JP), filed on Mar. 11, 2019.
Prior Publication US 2022/0138338 A1, May 5, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 16/28 (2019.01); G06F 21/62 (2013.01)
CPC G06F 21/6218 (2013.01) [G06F 16/285 (2019.01)] 3 Claims
OG exemplary drawing
 
1. A data replacement apparatus for replacing attribute values with representative values for each of groups, the data replacement apparatus comprising:
attribute value set retrieval circuitry that retrieves a grouped attribute value set into a primary storage device when a size of the grouped attribute value set is equal to or smaller than a predefined size and retrieves the grouped attribute value set into a secondary storage device when the size of the grouped attribute value set is larger than the predefined size, wherein the primary storage device is physically separate from the secondary storage device, and the secondary storage device is slower than the primary storage device;
median computation circuitry that computes a median of the grouped attribute value set at the primary storage device or at the secondary storage device;
division determination circuitry that, if a size of each of two attribute value sets which are formed by dividing the grouped attribute value set into two parts based on the median is equal to or greater than a predetermined threshold, sets respective ones of the two attribute value sets formed by the division as new groups;
a joined set generation circuitry that generates a joined set which is formed by arranging record numbers associated with the attribute values such that the attribute values in each of the groups which have converged after repeated execution of processing by the attribute value set retrieval circuitry, the median computation circuitry, and the division determination circuitry are consecutive;
a rearrangement circuitry that rearranges the attribute values in the secondary storage device based on the joined set;
a representative value replacement circuitry that sequentially executes processing for retrieving some of the rearranged attribute values from the secondary storage device into the primary storage device, and replaces the attribute values retrieved into the primary storage device with the representative values; and
a re-rearrangement circuitry that moves the representative values to the secondary storage device and rearranges them into an original order.