US 12,175,308 B2
System, method, and computer-accessible medium for evaluating multi-dimensional synthetic data using integrated variants analysis
Mark Watson, Sedona, AZ (US); Fardin Abdi Taghi Abad, Seattle, WA (US); Anh Truong, Champaign, IL (US); Kenneth Taylor, Champaign, IL (US); Reza Farivar, Champaign, IL (US); Jeremy Goodsitt, Champaign, IL (US); Austin Walters, Savoy, IL (US); and Vincent Pham, Champaign, IL (US)
Assigned to CAPITAL ONE SERVICES, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Jan. 3, 2024, as Appl. No. 18/402,937.
Application 18/402,937 is a continuation of application No. 17/845,786, filed on Jun. 21, 2022, granted, now 11,900,178.
Application 17/845,786 is a continuation of application No. 16/825,040, filed on Mar. 20, 2020, granted, now 11,385,943, issued on Jul. 12, 2022.
Application 16/825,040 is a continuation of application No. 16/152,072, filed on Oct. 4, 2018, granted, now 10,635,939, issued on Apr. 28, 2020.
Claims priority of provisional application 62/694,968, filed on Jul. 6, 2018.
Prior Publication US 2024/0160502 A1, May 16, 2024
Int. Cl. G06F 9/54 (2006.01); G06F 8/71 (2018.01); G06F 11/36 (2006.01); G06F 16/22 (2019.01); G06F 16/242 (2019.01); G06F 16/2455 (2019.01); G06F 16/248 (2019.01); G06F 16/25 (2019.01); G06F 16/28 (2019.01); G06F 16/335 (2019.01); G06F 16/903 (2019.01); G06F 16/9032 (2019.01); G06F 16/9038 (2019.01); G06F 16/906 (2019.01); G06F 16/93 (2019.01); G06F 17/15 (2006.01); G06F 17/16 (2006.01); G06F 17/18 (2006.01); G06F 18/20 (2023.01); G06F 18/21 (2023.01); G06F 18/2115 (2023.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06F 18/23 (2023.01); G06F 18/24 (2023.01); G06F 18/2411 (2023.01); G06F 18/2415 (2023.01); G06F 18/40 (2023.01); G06F 21/55 (2013.01); G06F 21/60 (2013.01); G06F 21/62 (2013.01); G06F 30/20 (2020.01); G06F 40/117 (2020.01); G06F 40/166 (2020.01); G06F 40/20 (2020.01); G06N 3/04 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/06 (2006.01); G06N 3/08 (2023.01); G06N 3/088 (2023.01); G06N 5/00 (2023.01); G06N 5/02 (2023.01); G06N 5/04 (2023.01); G06N 7/00 (2023.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01); G06Q 10/04 (2023.01); G06T 7/194 (2017.01); G06T 7/246 (2017.01); G06T 7/254 (2017.01); G06T 11/00 (2006.01); G06V 10/70 (2022.01); G06V 10/98 (2022.01); G06V 30/194 (2022.01); G06V 30/196 (2022.01); H04L 9/40 (2022.01); H04L 67/00 (2022.01); H04L 67/306 (2022.01); H04N 21/234 (2011.01); H04N 21/81 (2011.01)
CPC G06F 9/541 (2013.01) [G06F 8/71 (2013.01); G06F 9/54 (2013.01); G06F 9/547 (2013.01); G06F 11/3608 (2013.01); G06F 11/3628 (2013.01); G06F 11/3636 (2013.01); G06F 16/2237 (2019.01); G06F 16/2264 (2019.01); G06F 16/2423 (2019.01); G06F 16/24568 (2019.01); G06F 16/248 (2019.01); G06F 16/254 (2019.01); G06F 16/258 (2019.01); G06F 16/283 (2019.01); G06F 16/285 (2019.01); G06F 16/288 (2019.01); G06F 16/335 (2019.01); G06F 16/90332 (2019.01); G06F 16/90335 (2019.01); G06F 16/9038 (2019.01); G06F 16/906 (2019.01); G06F 16/93 (2019.01); G06F 17/15 (2013.01); G06F 17/16 (2013.01); G06F 17/18 (2013.01); G06F 18/2115 (2023.01); G06F 18/214 (2023.01); G06F 18/2148 (2023.01); G06F 18/217 (2023.01); G06F 18/2193 (2023.01); G06F 18/22 (2023.01); G06F 18/23 (2023.01); G06F 18/24 (2023.01); G06F 18/2411 (2023.01); G06F 18/2415 (2023.01); G06F 18/285 (2023.01); G06F 18/40 (2023.01); G06F 21/552 (2013.01); G06F 21/60 (2013.01); G06F 21/6245 (2013.01); G06F 21/6254 (2013.01); G06F 30/20 (2020.01); G06F 40/117 (2020.01); G06F 40/166 (2020.01); G06F 40/20 (2020.01); G06N 3/04 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/06 (2013.01); G06N 3/08 (2013.01); G06N 3/088 (2013.01); G06N 5/00 (2013.01); G06N 5/02 (2013.01); G06N 5/04 (2013.01); G06N 7/00 (2013.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01); G06Q 10/04 (2013.01); G06T 7/194 (2017.01); G06T 7/246 (2017.01); G06T 7/248 (2017.01); G06T 7/254 (2017.01); G06T 11/001 (2013.01); G06V 10/768 (2022.01); G06V 10/993 (2022.01); G06V 30/194 (2022.01); G06V 30/1985 (2022.01); H04L 63/1416 (2013.01); H04L 63/1491 (2013.01); H04L 67/306 (2013.01); H04L 67/34 (2013.01); H04N 21/23412 (2013.01); H04N 21/8153 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory computer-accessible medium having stored thereon computer-executable instructions for evaluating a synthetic dataset, wherein, when a computer hardware arrangement executes the instructions, the computer hardware arrangement is configured to perform procedures comprising:
training a model using an original dataset and a synthetic dataset;
determining a data similarity score including a combined score of exact-match overlap score and fuzzy-match overlap score based on the synthetic dataset and the original dataset;
determining a data quality score including a combined score of row-duplicate score, repeated-value score and schema-preservation score based on the synthetic dataset and the original dataset;
evaluating the synthetic dataset based on the training of the model, the data similarity score, and the data quality score;
determining a region for the synthetic dataset based on evaluating the synthetic dataset, wherein the region defines a status of the synthetic dataset; and
generating a suggestion based on the determined region for building predicative models on the synthetic dataset,
wherein the suggestion includes at least one of (a) indicating that the at least one synthetic dataset is adequate or (b) warning that the at least one synthetic dataset potentially contains information similar to the at least one original dataset.