US 12,373,469 B2
Synthetic generation of data with many to many relationships
Kai Xu, Sevenoaks (GB); Georgi Valentinov Ganev, Sevenoaks (GB); Emile Isak Joubert, Sevenoaks (GB); Rees Stephen Davison, Sevenoaks (GB); Olivier Rene Maurice Van Acker, Sevenoaks (GB); Luke Anthony William Robinson, Sevenoaks (GB); and Sofiane Mahiou, Sevenoaks (GB)
Assigned to SAS Institute Inc., Cary, NC (US)
Filed by SAS Institute Inc., Cary, NC (US)
Filed on Nov. 8, 2024, as Appl. No. 18/941,263.
Application 18/941,263 is a continuation of application No. PCT/GB2023/051318, filed on May 18, 2023.
Claims priority of application No. 2207384 (GB), filed on May 19, 2022.
Prior Publication US 2025/0068658 A1, Feb. 27, 2025
Int. Cl. G06F 16/28 (2019.01)
CPC G06F 16/288 (2019.01) 30 Claims
OG exemplary drawing
 
1. A non-transitory computer-readable medium comprising computer-readable instructions stored thereon that when executed by a processor cause the processor to:
generate a synthetic graph from a first plurality of entries in a first table (Table A) and a second plurality of entries in a second table (Table B), wherein the synthetic graph comprises a first set of nodes (U′), a second set of nodes (V′) and a first set of edges (L′), wherein the first table (Table A) and the second table (Table B) form a first data set corresponding to real data, wherein the first plurality of entries in the first table (Table A) and the second plurality of entries in the second table (Table B) have a many-to-many relationship, and wherein generating the synthetic graph comprises generating a graph topology only for the first set of nodes (U′) and the second set of nodes (V′);
determine one or more first attributes associated with each node in the first set of nodes (U′) using a first conditional model (p(U′|L′)) conditioned on the synthetic graph by:
obtaining the first conditional model ((p(U|L));
determining a node embedding for each node in the first set of nodes (U′) based on a node embedding model (β) and the synthetic graph;
determining a probability distribution of each attribute associated with a node in the first set of nodes (U′) using the first conditional model ((p(U|L))) and the node embedding associated with the node; and
sampling from the probability distribution to obtain each attribute of the one or more first attributes associated with the node; and
determine one or more second attributes associated with each node in the second set of nodes (V′) using a second conditional model (p(V′|L′, U′)) conditioned on the synthetic graph and on the one or more first attributes associated with each node in the first set of nodes (U′);
wherein the one or more first attributes associated with each node in the first set of nodes (U′) correspond to a third plurality of entries in a third table (Table A′);
wherein the one or more second attributes associated with each node in the second set of nodes (V′) correspond to a fourth plurality of entries in a fourth table (Table B′);
wherein the third table (Table A′) and the fourth table (Table B′) comprise synthetic data generated from the real data; and
wherein the third plurality of entries in the third table (Table A′) and the fourth plurality of entries in the fourth table (Table B′) have the many-to-many relationship as the first plurality of entries in the first table (Table A) and the second plurality of entries in the second table (Table B) while protecting privacy of information within the first plurality of entries in the first table (Table A) and the second plurality of entries in the second table (Table B).