US 12,292,818 B2
System and method for generating synthetic test data
Lisa Kay Warfield, Hampton, VA (US); and Amanda Jean Dussault, Wake Forrest, NC (US)
Assigned to COGNIZANT TECHNOLOGY SOLUTIONS US CORP., Teaneck, NJ (US)
Filed by Cognizant Technology Solutions US Corp., Teaneck, NJ (US)
Filed on Feb. 4, 2022, as Appl. No. 17/592,560.
Prior Publication US 2023/0251959 A1, Aug. 10, 2023
Int. Cl. G06F 11/3668 (2025.01)
CPC G06F 11/3684 (2013.01) [G06F 11/368 (2013.01); G06F 11/3688 (2013.01); G06F 11/3692 (2013.01)] 28 Claims
OG exemplary drawing
 
1. A method for automatically generating synthetic test data for testing healthcare data processing applications, wherein the method is implemented by a processor executing program instructions stored in a memory, the method comprising:
generating, by the processor, a data structure associated the data processing application, wherein the data structure is a tree structure populated with one or more predefined configurable segments based on a selected operating field of the data processing application, each of the one or more predefined segments further comprising one or more customizable sub-segments, wherein the predefined configurable segments are representative of test data fields required in data records associated with an entity for the selected operating field, and wherein the customizable sub-segments are representative of attributes of the corresponding predefined configurable segments selected based on a practice area associated with the selected operating field;
evaluating, by the processor, most probable and optimized combinations between data values of the customizable sub-segments; and
generating, by the processor, synthetic test data in real-time comprising a plurality of data records based on the generated data structure and the evaluated combinations of the data values of the customizable sub-segments without using confidential protected health information data, wherein a number of the plurality of data records is equivalent to the evaluated number of combinations of the customizable sub-segments which are arranged within each of the plurality of data records and populated with the data values based on one or more parameters that define characteristics of the customizable sub-segments, and wherein dependent data records of the plurality of data records which are representative of dependents of members of the entity are defined by linking one or more member data records with other member data records of the plurality of data records using a unique family link ID.