US 12,265,649 B2
Synthetic data generation apparatus, method for the same, and program
Rina Okada, Musashino (JP); Satoshi Hasegawa, Musashino (JP); Shogo Masaki, Musashino (JP); and Satoshi Tanaka, Musashino (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Chiyoda-ku (JP)
Appl. No. 16/753,037
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Chiyoda-ku (JP)
PCT Filed Oct. 5, 2018, PCT No. PCT/JP2018/037309
§ 371(c)(1), (2) Date Apr. 2, 2020,
PCT Pub. No. WO2019/073912, PCT Pub. Date Apr. 18, 2019.
Claims priority of application No. 2017-199200 (JP), filed on Oct. 13, 2017.
Prior Publication US 2020/0257824 A1, Aug. 13, 2020
Int. Cl. G06F 21/62 (2013.01); G06F 16/25 (2019.01); G06F 16/28 (2019.01)
CPC G06F 21/6254 (2013.01) [G06F 16/258 (2019.01); G06F 16/285 (2019.01)] 3 Claims
OG exemplary drawing
 
3. A synthetic data generation method for execution by a synthetic data generation apparatus that includes storage and processing circuitry, the synthetic data generation method comprising:
a coding step in which the processing circuitry codes a value of each of category attributes contained in original data into a value of a numerical attribute in accordance with a coding rule which is stored in the storage and indicates correspondence between a code and a value of a category attribute;
a data formatting step in which the processing circuitry generates first synthetic data from the original data after coding using a synthetic data generation method for numerical attributes;
a conversion step in which, if the value of the numerical attribute which is contained in the first synthetic data and corresponds to the value of one of the category attributes exceeds a range of values that can be assumed by the value of that numerical attribute, the processing circuitry converts the value of that numerical attribute to a value included in the range of values that can be assumed by the value of that numerical attribute; and
a decoding step in which the processing circuitry decodes the value of the numerical attribute which is contained in the first synthetic data after conversion and which corresponds to the value of one of the category attributes to the value of that category attribute in accordance with the coding rule to obtain synthetic data, wherein
the value of the numerical attribute is a value that can be measured numerically, and the value of the category attribute is a value that cannot be measured numerically,
the coding rule is a 1-of-K coding method, and
the synthetic data maintains relationships among all the attributes in the original data, wherein
(i) the relationships are variance-covariance or (ii) the relationships are correlation.