US 12,191,000 B2
	Systems and methods for classifying patients with respect to multiple cancer classes
M. Cyrus Maher, San Mateo, CA (US); Anton Valouev, Palo Alto, CA (US); Darya Filippova, Sunnyvale, CA (US); Virgil Nicula, Cupertino, CA (US); Karthik Jagadeesh, San Francisco, CA (US); Oliver Claude Venn, San Francisco, CA (US); Samuel S. Gross, Sunnyvale, CA (US); John F. Beausang, Menlo Park, CA (US); and Robert Abe Paine Calef, Redwood City, CA (US)
Assigned to GRAIL, INC., Menlo Park, CA (US)
Filed by Grail, LLC, Menlo Park, CA (US)
Filed on Jan. 6, 2023, as Appl. No. 18/151,197.
Application 18/151,197 is a continuation of application No. 16/709,537, filed on Dec. 10, 2019, granted, now 11,581,062.
Claims priority of provisional application 62/777,693, filed on Dec. 10, 2018.
Prior Publication US 2023/0170048 A1, Jun. 1, 2023
Int. Cl. G16B 30/00 (2019.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G16B 20/20 (2019.01); G16B 40/00 (2019.01); G16H 10/40 (2018.01); G16H 10/60 (2018.01); G16H 50/20 (2018.01); G16H 50/70 (2018.01); G16H 70/60 (2018.01)

CPC G16B 30/00 (2019.02) [G06N 5/04 (2013.01); G06N 20/00 (2019.01); G16B 20/20 (2019.02); G16B 40/00 (2019.02); G16H 10/40 (2018.01); G16H 10/60 (2018.01); G16H 50/20 (2018.01); G16H 50/70 (2018.01); G16H 70/60 (2018.01)]

27 Claims

1. A method of training an untrained first classifier to classify a test subject of a given species to a cancer class in a plurality of cancer classes using a computer system comprising one or more processors, the method comprising:

obtaining, by the computer system and for each respective reference subject in a first plurality of reference subjects, (i) a cancer class of the respective reference subject and (ii) a sequencing construct for the respective reference subject that includes a first bin count for each respective bin in a plurality of bins that collectively represent all or a portion of a reference genome of the species, wherein each respective first bin count representative of a number of nucleic acid fragments measured from nucleic acids in a biological sample obtained from the respective reference subject that maps onto a different and non-overlapping portion of the reference genome of the species, wherein, for each respective cancer class in the plurality of cancer classes, the first plurality of reference subjects includes at least one reference subject that has the respective cancer class;

improving a computational efficiency of the computer system by collectively subjecting, by the computer system, the first bin count of each bin in the plurality of bins for each reference subject in the first plurality of reference subjects to a dimensionality reduction method thereby obtaining a feature set, wherein the feature set consists of a number of features that is fewer than the number of bins in the plurality of bins;

resampling, using the computer system, the feature set a plurality of times, wherein the resampling comprises forming, for each respective training iteration in a plurality of training iterations, a trained component classifier via:

omitting, from the feature set, a subset of values for features in the feature set for the first plurality of reference subjects; and

forming the trained component classifier by inputting, in conjunction with the cancer class of respective reference subjects in the first plurality of reference subjects as ground truth, remaining values for the features in the feature set as collective input to a respective untrained component classifier; and

constructing, as a result of the resampling and by collectively leveraging output generated by the trained component classifier formed in each of the plurality of training iterations, a trained first classifier having an improved cancer class recognition ability over the untrained first classifier.