| CPC C12Q 1/6886 (2013.01) [C12Q 1/6851 (2013.01); G16B 20/00 (2019.02); G16B 20/30 (2019.02); G16B 30/10 (2019.02); G16B 40/00 (2019.02); G16B 40/20 (2019.02); G16B 50/00 (2019.02); C12Q 1/6806 (2013.01); C12Q 2600/154 (2013.01)] | 26 Claims |
|
1. A method of analyzing a biological sample of a subject to determine a level of a pathology, wherein the pathology is a cancer, in the biological sample of the subject, the biological sample including cell-free DNA, the method comprising performing, by a computer system:
receiving, over a network connection or from a computer-readable medium, sequence reads obtained from an assay performed on a plurality of cell-free DNA molecules from the biological sample to obtain sequence reads, wherein the sequence reads include ending sequences corresponding to ends of the plurality of cell-free DNA molecules;
for each of the plurality of cell-free DNA molecules, determining a sequence motif for each of one or more ends of the cell-free DNA molecule, wherein an end of a cell-free DNA molecule has a first position at an outermost position, a second position that is next to the first position, and a third position that is next to the second position, wherein the plurality of cell-free DNA molecules includes at least 10,000 cell-free DNA molecules;
determining a first set of amounts of a first set of end sequence motifs of the plurality of cell-free DNA molecules, wherein:
each of the first set of end sequence motifs has C at the first position and G at the second position, or
each of the first set of end sequence motifs has C at the second position and G at the third position;
generating a feature vector including the first set of amounts, the feature vector generated using end sequence motifs only selected from a group consisting of (1) end sequence motifs having C at the first position and G at the second position and (2) end sequence motifs having C at the second position and G at the third position; inputting the feature vector into a machine learning model, wherein the machine learning model is trained using cell-free DNA molecules in training samples having known classifications;
determining, using the machine learning model and the feature vector, a probability for the level of the pathology;
determining a classification of the level of the cancer for the subject based on a comparison of the probability to a cutoff value, wherein the classification is that the subject has the cancer; and
administering a treatment to the subject, wherein the treatment includes radiation therapy, immunotherapy, chemotherapy, hormone therapy, stem cell transplant, or surgery to treat the cancer.
|