US 11,929,148 B2
Systems and methods for enriching for cancer-derived fragments using fragment size
Darya Filippova, Sunnyvale, CA (US); Matthew H. Larson, San Francisco, CA (US); M. Cyrus Maher, San Mateo, CA (US); Monica Portela dos Santos Pimentel, San Jose, CA (US); and Robert Abe Paine Calef, Redwood City, CA (US)
Assigned to GRAIL, LLC, Menlo Park, CA (US)
Filed by GRAIL, LLC, Menlo Park, CA (US)
Filed on Mar. 12, 2020, as Appl. No. 16/816,918.
Claims priority of provisional application 62/854,888, filed on May 30, 2019.
Claims priority of provisional application 62/818,013, filed on Mar. 13, 2019.
Prior Publication US 2020/0294624 A1, Sep. 17, 2020
Int. Cl. G16B 30/00 (2019.01); C12Q 1/6886 (2018.01); G06N 20/00 (2019.01); G16B 20/10 (2019.01); G16H 10/40 (2018.01); G16H 10/60 (2018.01); G16H 50/20 (2018.01); G16H 50/50 (2018.01); G16H 50/70 (2018.01)
CPC G16B 30/00 (2019.02) [C12Q 1/6886 (2013.01); G06N 20/00 (2019.01); G16B 20/10 (2019.02); G16H 10/40 (2018.01); G16H 10/60 (2018.01); G16H 50/20 (2018.01); G16H 50/50 (2018.01); G16H 50/70 (2018.01); C12Q 2600/112 (2013.01)] 15 Claims
 
1. A method of determining a cancer class of a subject, comprising:
extracting a plurality of cell-free DNA molecules in a biological sample acquired from a subject;
removing, from the plurality of cell-free DNA molecules, cell-free DNA molecules longer than a first threshold length to obtain a pool of size-selected cell-free DNA molecules, wherein the first threshold length is less than 160 nucleotides;
sequencing the biological sample based on the pool of size-selected cell-free DNA molecules to obtain a plurality of size-selected sequence reads, wherein the plurality of size-selected sequence reads comprise at least 60,000 sequence reads;
identifying, from the plurality of size-selected sequence reads, a relative copy number at each respective genomic location in at least fifty genomic locations in the genome of the subject; and
applying the identified relative copy numbers into a machine learning model trained to determine the cancer class for the subject based on the relative copy number at each respective genomic location, wherein the machine learning model is trained with a training dataset labeled by cancer class.