CPC G16B 40/20 (2019.02) [G06F 16/907 (2019.01); G06F 18/213 (2023.01); G06F 18/214 (2023.01); G06F 18/217 (2023.01); G06F 18/23 (2023.01); G06F 18/23211 (2023.01); G06F 18/24 (2023.01); G06F 18/2415 (2023.01); G06F 18/2431 (2023.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06N 7/01 (2023.01); G06V 10/454 (2022.01); G06V 10/763 (2022.01); G06V 10/764 (2022.01); G06V 10/7715 (2022.01); G06V 10/7784 (2022.01); G06V 10/82 (2022.01); G06V 10/993 (2022.01); G16B 40/00 (2019.02); G06N 5/046 (2013.01)] | 20 Claims |
1. A computer-implemented method of quality scoring for a set of base calls, including:
processing input data for one or more analytes through a neural network-based base caller and producing an alternative representation of the input data;
processing the alternative representation through an output layer to produce an output, wherein the output identifies likelihoods of a base incorporated in a particular one of the analytes being A, C, T, and G;
calling bases for one or more of the analytes based on the output; and
determining quality scores of the called bases based on the likelihoods identified by the output by:
quantizing classification scores of base calls produced by the neural network-based base caller in response to processing training data during training;
selecting a set of quantized classification scores;
for each quantized classification score in the set of quantized classification scores, determining a base calling error rate by comparing predicted base calls for the set of quantized classification scores to corresponding ground truth base calls;
determining a fit between the set of quantized classification scores and corresponding base calling error rates; and
correlating the quality scores to the set of quantized classification scores based on the fit.
|