| CPC C12Q 1/6869 (2013.01) [G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01); G06N 20/20 (2019.01); G16B 30/00 (2019.02); G16B 40/10 (2019.02)] | 24 Claims |

|
1. A method of high rate sequencing of polymers using a nanopore measurement and analysis system, the method comprising:
placing a polymer into the nanopore measurement and analysis system; and
sequencing the polymer using the nanopore measurement and analysis system at least in part by:
translocating at least a portion of the polymer through a nanopore of the nanopore measurement and analysis system at a sequencing rate in the range of 10-1000 polymer units per second, wherein the sequencing rate comprises a rate at which the polymer translocates through the nanopore;
measuring, using the nanopore measurement and analysis system and at a sampling rate between 100 Hz and 30 KHz, electrical signals generated by the translocating of the polymer through the nanopore, wherein the sampling rate is greater than or equal to the sequencing rate;
generating a time-ordered series of measurements based on the measuring of the electrical signals generated by the translocating;
organizing the time-ordered series of measurements into a plurality of overlapping subsets of measurements;
generating a plurality of feature vectors from the plurality of overlapping subsets of measurements by processing the subsets of measurements using a convolutional neural network;
generating a plurality of sets of transition weights from the plurality of feature vectors using a recurrent neural network;
generating, using the plurality of sets of transition weights, a Hidden Markov Model (HMM);
determining, using the HMM, an estimate of a sequence of polymer units in the polymer; and
outputting the estimated sequence of polymer units,
wherein the recurrent neural network comprises a bidirectional recurrent layer, the bidirectional recurrent layer comprising:
a first unidirectional recurrent layer comprising a first plurality of long short-term memory (LSTM) units connected in a first direction; and
a second unidirectional recurrent layer comprising a second plurality of LSTM units connected in a second direction, opposite the first direction, wherein outputs of the first unidirectional recurrent layer are inputs into the second unidirectional recurrent layer,
wherein generating the plurality of sets of transition weights comprises:
updating state vectors of multiple LSTM units in the first unidirectional recurrent layer based on the feature vectors and state vectors of LSTM units preceding the multiple LSTM units in the first unidirectional recurrent layer; and
updating state vectors of multiple LSTM units in the second unidirectional recurrent layer based on outputs of the first unidirectional recurrent layer and state vectors of LSTM units preceding the multiple LSTM units in the second unidirectional recurrent layer; and
wherein each weight of a particular set of transition weights of the plurality of sets of transition weights is associated with respective first and second labels and is indicative of the likelihood that a transition between a polymer unit having the first label and a polymer unit having the second label occurred within a measurement period represented by a subset of measurements, of the plurality of overlapping subsets, associated with the particular set of transition weights.
|