US 11,899,692 B2
Database reduction based on geographically clustered data to provide record selection for clinical trials
Stephen Michael Jones, Rickmansworth (GB); Michelle Lsl Jones, Rickmansworth (GB); Michael Brian Garcia, West Windsor, NJ (US); Elizabeth Martina Marshallsay, Las Vegas, NV (US); and Rachael Haig, Wilmslow (GB)
Assigned to Laboratory Corporation of America Holdings, Burlington, NC (US)
Appl. No. 17/602,440
Filed by Laboratory Corporation of America Holdings, Burlington, NC (US)
PCT Filed Apr. 7, 2020, PCT No. PCT/US2020/027025
§ 371(c)(1), (2) Date Oct. 8, 2021,
PCT Pub. No. WO2020/210206, PCT Pub. Date Oct. 15, 2020.
Claims priority of provisional application 62/833,328, filed on Apr. 12, 2019.
Prior Publication US 2022/0208313 A1, Jun. 30, 2022
Int. Cl. G06F 16/28 (2019.01); G16H 10/20 (2018.01); G06F 16/215 (2019.01); G06F 16/2457 (2019.01); G06F 16/24 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/215 (2019.01); G06F 16/24578 (2019.01); G16H 10/20 (2018.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a data store;
a non-transitory computer-readable medium including computer program code for database record selection; and
a processing device communicatively coupled to the data store and the non-transitory computer-readable medium, wherein the processing device is configured for executing the computer program code to perform operations comprising:
identifying data sources for geographically clustered data containing corresponding descriptors for unique clinical trial investigators across different sources of information for database records to be written to the data store;
formatting the corresponding descriptors to produce standardized, corresponding descriptors;
matching each standardized, corresponding descriptor of the standardized corresponding descriptors to produce a record score for each standardized, corresponding descriptor;
producing, for each standardized corresponding descriptor, a modified score using the record score and a number of characters in at least one of the corresponding descriptors;
combining the modified scores for the standardized, corresponding descriptors to produce a binary overall score for each database record of the database records; and
selectively writing each database record to the data store based on the binary overall score to compile a deduplicated database of the unique clinical trial investigators.