US 11,676,069 B2
Synthetic data generation using anonymity preservation in computer-based reasoning systems
Christopher James Hazard, Raleigh, NC (US)
Assigned to Diveplane Corporation, Raleigh, NC (US)
Filed by Diveplane Corporation, Raleigh, NC (US)
Filed on Sep. 30, 2020, as Appl. No. 17/38,955.
Application 17/038,955 is a continuation in part of application No. 17/006,144, filed on Aug. 28, 2020.
Application 17/006,144 is a continuation in part of application No. 16/713,714, filed on Dec. 13, 2019.
Application 16/713,714 is a continuation in part of application No. 16/219,476, filed on Dec. 13, 2018, abandoned.
Claims priority of provisional application 63/036,741, filed on Jun. 9, 2020.
Claims priority of provisional application 63/024,152, filed on May 13, 2020.
Claims priority of provisional application 62/814,585, filed on Mar. 6, 2019.
Prior Publication US 2021/0012246 A1, Jan. 14, 2021
Prior Publication US 2023/0148457 A9, May 11, 2023
Int. Cl. G06F 15/16 (2006.01); G06N 20/00 (2019.01); G06N 5/02 (2023.01)
CPC G06N 20/00 (2019.01) [G06N 5/02 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory computer readable medium storing instructions which, when executed by one or more computing devices, cause the one or more computing devices to perform a process of:
receiving a request for generation of synthetic data based on a set of training data cases;
determining one or more focal training data cases from among the set of training data cases;
determining a synthetic data case based on the one or more focal training data cases;
determining whether the synthetic data case is overly similar to one or more cases in the set of training data cases;
when the synthetic data case is determined to be overly similar to one or more cases in the set of training data cases, taking corrective action for the synthetic data case to produce a new synthetic data case and using the new synthetic data case for consideration in adding to a set of synthetic training data cases;
when the synthetic data case is determined to not be overly similar to cases in the set of training data cases, adding the synthetic data case to the set of synthetic training data cases.