US 12,475,409 B2
Evaluation of modeling algorithms with continuous outputs
Lefei Liu, Alpharetta, GA (US); Peter Liu, Alpharetta, GA (US); Jiawei Liu, Alpharetta, GA (US); Peter Gao, Alpharetta, GA (US); and Vickey Chang, Suwanee, GA (US)
Assigned to EQUIFAX INC., Atlanta, GA (US)
Filed by EQUIFAX INC., Atlanta, GA (US)
Filed on Nov. 4, 2022, as Appl. No. 18/052,762.
Application 18/052,762 is a continuation of application No. 16/669,959, filed on Oct. 31, 2019, granted, now 11,521,020.
Claims priority of provisional application 62/753,899, filed on Oct. 31, 2018.
Prior Publication US 2023/0100730 A1, Mar. 30, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 20/20 (2019.01); G06F 17/18 (2006.01); G06F 18/21 (2023.01); G06F 18/2415 (2023.01); G06N 5/01 (2023.01); G06N 20/00 (2019.01); G06V 10/75 (2022.01)
CPC G06N 20/20 (2019.01) [G06F 17/18 (2013.01); G06F 18/217 (2023.01); G06F 18/2415 (2023.01); G06N 5/01 (2023.01); G06N 20/00 (2019.01); G06V 10/751 (2022.01)] 17 Claims
OG exemplary drawing
 
1. A system comprising:
a data repository storing data samples having values of variables for input to a machine-learning model for risk assessment for an entity,
an external-facing subsystem configured for preventing a host server system from accessing the data repository via a data network, and
an evaluation system configured for:
accessing (a) an estimated dataset having a set of estimated values of an attribute that is a continuous variable, the estimated dataset generated by applying the machine-learning model to an input dataset of the data samples and (b) a validation dataset having a set of validation values of the attribute, the set of validation values respectively being known values corresponding to the set of estimated values generated by the machine-learning model,
generating, from a comparison of the estimated dataset and the validation dataset to an outcome of interest, a discretized evaluation dataset with data values in multiple categories, the discretized evaluation dataset comprising a set of categories in a classification matrix and a number of instances in each category, the set of categories including a true positive category, a true negative category, a false positive category, and a false negative category,
computing, for the machine-learning model, an evaluation metric based on a comparison of data values from different categories of the discretized evaluation dataset, the evaluation metric indicating an accuracy of the machine-learning model, and
providing the host server system with access to (a) the evaluation metric or (b) a modeling output generated with the machine-learning model which indicates a risk level associated with the entity, causing the host server system to allow or prevent the entity to access to a restricted function of a computing environment, based on the modeling output, wherein generating the discretized evaluation dataset comprises:
identifying a first category for the discretized evaluation dataset indicating a match between estimated attribute values and validation attribute values with respect to the outcome of interest;
identifying a second category for the discretized evaluation dataset indicating a mismatch between estimated attribute values and validation attribute values with respect to the outcome of interest;
determining, from the comparison of the estimated dataset and the validation dataset to the outcome of interest, a number of matches in the first category and a number of mismatches in the second category; and
outputting the discretized evaluation dataset having the first category with the number of matches and the second category with the number of mismatches.