| CPC G16B 30/00 (2019.02) [G06N 3/08 (2013.01); G06N 20/00 (2019.01); G16B 5/00 (2019.02); G16B 20/20 (2019.02); G16B 40/00 (2019.02); G16B 40/20 (2019.02)] | 14 Claims |
|
1. A computer-implemented method for detecting variations in a biopolymer sequence relative to a reference sequence comprising:
generating embeddings of at least one million sequence reads obtained by sequencing a sample of the biopolymer at less than 10× coverage, each read corresponding to a region of at least 200 bp of biopolymer sequence, wherein each embedding of a sequence read is concatenated with an embedding of a corresponding reference sequence; and
detecting one or more candidate variations in the biopolymer sequence based at least in part on the embeddings of the at least one million sequence reads, wherein detecting comprises processing the embeddings with a deep learning model comprising a series of 1-dimensional convolution layers, wherein a mean pooling of one or more outputs from at least one of the 1-dimensional convolution layers is added back to the one or more outputs from the same at least one 1-dimensional layers before it is input into the subsequent 1-dimensional convolution layer, and max and mean pooling the output of the series of 1-dimensional convolution layers.
|