| CPC G06F 21/6218 (2013.01) [G06F 16/285 (2019.01)] | 3 Claims |

|
1. A data replacement apparatus for replacing attribute values with representative values for each of groups, the data replacement apparatus comprising:
attribute value set retrieval circuitry that retrieves a grouped attribute value set into a primary storage device when a size of the grouped attribute value set is equal to or smaller than a predefined size and retrieves the grouped attribute value set into a secondary storage device when the size of the grouped attribute value set is larger than the predefined size, wherein the primary storage device is physically separate from the secondary storage device, and the secondary storage device is slower than the primary storage device;
median computation circuitry that computes a median of the grouped attribute value set at the primary storage device or at the secondary storage device;
division determination circuitry that, if a size of each of two attribute value sets which are formed by dividing the grouped attribute value set into two parts based on the median is equal to or greater than a predetermined threshold, sets respective ones of the two attribute value sets formed by the division as new groups;
a joined set generation circuitry that generates a joined set which is formed by arranging record numbers associated with the attribute values such that the attribute values in each of the groups which have converged after repeated execution of processing by the attribute value set retrieval circuitry, the median computation circuitry, and the division determination circuitry are consecutive;
a rearrangement circuitry that rearranges the attribute values in the secondary storage device based on the joined set;
a representative value replacement circuitry that sequentially executes processing for retrieving some of the rearranged attribute values from the secondary storage device into the primary storage device, and replaces the attribute values retrieved into the primary storage device with the representative values; and
a re-rearrangement circuitry that moves the representative values to the secondary storage device and rearranges them into an original order.
|