| CPC G06N 5/01 (2023.01) | 19 Claims |

|
1. A system for facilitating cancer-related prediction accuracy of a trained model, without requiring renormalization of the entirety of a given training dataset for training the model for novel data, by arranging preprocessing vectors and a trained random forest model, that are derived from the same training dataset, to respectively preprocess sample target data and generate predictions with the preprocessed data, the system comprising:
one or more processors and non-transitory machine-readable media storing instructions that, when executed by the one or more processors, cause operations comprising:
accessing a random forest machine learning model comprising features that are derived from a training dataset and selected for the random forest machine learning model;
obtaining target data derived from a target sample;
normalizing, using frozen vectors derived from randomly-selected subsets of the training dataset, the target data to generate processed data; and
after generating the processed data using the frozen vectors, generating, using the random forest machine learning model on the processed data and without requiring renormalization of the entirety of the training dataset, a likelihood score related to cancer occurrence by inputting the processed data into nodes of the random forest machine learning model, wherein the nodes and the frozen vectors are both derived from the training dataset.
|