US 12,378,610 B2
Systems and methods for preprocessing target data and generating predictions using a machine learning model
Christine Buerki, Vancouver (CA); Anamaria Crisan, Vancouver (CA); Elai Davicioni, La Jolla, CA (US); Nicholas George Erho, Vancouver (CA); Mercedeh Ghadessi, New Westminster (CA); Robert B. Jenkins, Rochester, MN (US); and Ismael A. Vergara Correa, West Vancouver (CA)
Assigned to Veracyte SD, Inc., South San Francisco, CA (US); and Mayo Foundation for Medical Education and Research, Rochester, MN (US)
Filed by Veracyte SD, Inc., South San Francisco, CA (US); and Mayo Foundation for Medical Education and Research, Rochester, MN (US)
Filed on Nov. 10, 2023, as Appl. No. 18/506,690.
Application 18/506,690 is a division of application No. 17/346,106, filed on Jun. 11, 2021, abandoned.
Application 17/346,106 is a division of application No. 13/968,838, filed on Aug. 16, 2013, granted, now 11,035,005, issued on Jun. 15, 2021.
Claims priority of provisional application 61/783,124, filed on Mar. 14, 2013.
Claims priority of provisional application 61/764,365, filed on Feb. 13, 2013.
Claims priority of provisional application 61/684,066, filed on Aug. 16, 2012.
Prior Publication US 2025/0156731 A1, May 15, 2025
Int. Cl. G06N 5/01 (2023.01)
CPC G06N 5/01 (2023.01) 19 Claims
OG exemplary drawing
 
1. A system for facilitating cancer-related prediction accuracy of a trained model, without requiring renormalization of the entirety of a given training dataset for training the model for novel data, by arranging preprocessing vectors and a trained random forest model, that are derived from the same training dataset, to respectively preprocess sample target data and generate predictions with the preprocessed data, the system comprising:
one or more processors and non-transitory machine-readable media storing instructions that, when executed by the one or more processors, cause operations comprising:
accessing a random forest machine learning model comprising features that are derived from a training dataset and selected for the random forest machine learning model;
obtaining target data derived from a target sample;
normalizing, using frozen vectors derived from randomly-selected subsets of the training dataset, the target data to generate processed data; and
after generating the processed data using the frozen vectors, generating, using the random forest machine learning model on the processed data and without requiring renormalization of the entirety of the training dataset, a likelihood score related to cancer occurrence by inputting the processed data into nodes of the random forest machine learning model, wherein the nodes and the frozen vectors are both derived from the training dataset.