US 12,272,431 B2
	Detecting false positive variant calls in next-generation sequencing
Mark Andrew DePristo, Palo Alto, CA (US); and Ryan Poplin, Palo Alto, CA (US)
Assigned to Verily Life Sciences LLC, Dallas, TX (US)
Filed by Verily Life Sciences LLC, Mountain View, CA (US)
Filed on May 13, 2022, as Appl. No. 17/744,387.
Application 17/744,387 is a continuation of application No. 15/490,572, filed on Apr. 18, 2017, granted, now 11,335,438.
Claims priority of provisional application 62/333,130, filed on May 6, 2016.
Prior Publication US 2022/0277811 A1, Sep. 1, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G01N 33/48 (2006.01); G01N 33/50 (2006.01); G06N 3/04 (2023.01); G06N 3/088 (2023.01); G06N 7/01 (2023.01); G16B 40/00 (2019.01)

CPC G16B 40/00 (2019.02) [G06N 3/04 (2013.01); G06N 3/088 (2013.01); G06N 7/01 (2023.01)]

18 Claims

1. A method comprising:

obtaining a neural network that has been trained to determine a likelihood that read pileup windows provided as input are representative of variants, wherein the neural network is produced by:

obtaining a plurality of read pileup windows associated with a first sample genome,

wherein each read pileup window of the plurality of read pileup windows is associated with a different reference nucleotide position within the first sample genome,

wherein each read pileup window of the plurality of read pileup windows includes sequence reads generated using a particular read process, and

wherein a given read pileup window of the plurality of read pileup windows includes a plurality of sequence reads that each include a nucleotide aligned at a given reference nucleotide position, within the first sample genome, that is associated with the given read pileup window;

obtaining, for each reference nucleotide position that is associated with a read pileup window within the plurality of read pileup windows, a label that indicates whether the reference nucleotide position is either (i) a known variant or (ii) a non-variant; and

training the neural network, based on data indicative of the plurality of read pileup windows and the labels;

receiving, as input, a read pileup window that is associated with a second sample genome and that includes sequence reads generated using the particular read process; and

applying the neural network to the read pileup window to produce an output that is representative of a likelihood that the read pileup window associated with the second sample genome is representative of a variant.