US 12,380,454 B2
System, method, and computer program product for generating synthetic data
Xiao Tian, Austin, TX (US); Jianhua Huang, Cedar Park, TX (US); Chiranjeet Chetia, Round Rock, TX (US); Shi Cao, Austin, TX (US); Marc Corbalan Vila, London (GB); and Claudia Carolina Barcenas Cardenas, Austin, TX (US)
Assigned to Visa International Service Association, San Francisco, CA (US)
Filed by Visa International Service Association, San Francisco, CA (US)
Filed on Mar. 20, 2023, as Appl. No. 18/123,362.
Application 18/123,362 is a continuation of application No. 17/136,108, filed on Dec. 29, 2020, granted, now 11,640,610.
Prior Publication US 2023/0230089 A1, Jul. 20, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06Q 40/00 (2023.01); G06F 16/901 (2019.01); G06N 20/00 (2019.01); G06Q 20/40 (2012.01)
CPC G06Q 20/4016 (2013.01) [G06F 16/9024 (2019.01); G06N 20/00 (2019.01); G06Q 20/4014 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A computer-implemented method for generating synthetic training data comprising:
receiving, with at least one processor, a plurality of data types associated with an environment to be evaluated;
receiving, with at least one processor, a plurality of correlations, each correlation of the plurality of correlations comprising a dependency of one data type of the plurality of data types on another data type of the plurality of data types;
generating, with at least one processor, a correlation graph of the plurality of data types based on the plurality of correlations;
generating, with at least one processor, a directed acyclic graph of the plurality of data types based on the correlation graph;
repeatedly traversing, with at least one processor, the directed acyclic graph to generate a hierarchical graph of the plurality of data types, the hierarchical graph comprising a plurality of nodes arranged in a plurality of tiers defined by a path length of nodes resulting from the traversals, wherein each node of the plurality of nodes is associated with a data type of the plurality of data types, wherein each node of the hierarchical graph is connected to a subsequent node according to an ordered dependency;
generating, with at least one processor, synthetic training data comprising a plurality of records of data by repeatedly traversing the hierarchical graph in accordance with the ordered dependency, wherein each record is based on a set of values determined at each node of the hierarchical graph, a set of probabilities associated with the set of values, and a set of interdependencies, wherein each set of values is associated with a data type of the plurality of data types, wherein each set of interdependencies is associated with a connected pair of nodes in the hierarchical graph, wherein each set of probabilities is associated with a set of interdependencies between the individual probabilities of one value of one data type and the probabilities of a subsequent connected data type;
training, with at least one processor, at least one machine learning model of a fraud detection system using the synthetic training data;
receiving, with at least one processor, an authorization request associated with a transaction between a merchant system and a payment device; and
during payment processing of the transaction:
generating, with at least one processor, a fraud evaluation of the transaction based on inputting at least a portion of the authorization request to the at least one machine learning model; and
declining, with at least one processor, the authorization request based on the fraud evaluation.