US 12,217,832 B2
Deep learning-based variant classifier
Ole Schulz-Trieglaff, Cambridge (GB); Anthony James Cox, Cambridge (GB); and Kai-How Farh, San Mateo, CA (US)
Assigned to Illumina, Inc., San Diego, CA (US); and Illumina Cambridge Limited, Cambridge (GB)
Filed by Illumina, Inc., San Diego, CA (US); and Illumina Cambridge Limited, Cambridge (GB)
Filed on May 9, 2023, as Appl. No. 18/314,638.
Application 18/314,638 is a continuation of application No. 16/247,487, filed on Jan. 14, 2019, granted, now 11,705,219.
Claims priority of provisional application 62/617,552, filed on Jan. 15, 2018.
Prior Publication US 2023/0386611 A1, Nov. 30, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G16B 40/20 (2019.01); G06F 9/38 (2018.01); G06F 18/214 (2023.01); G06F 18/2431 (2023.01); G06N 3/04 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2023.01); G16B 20/00 (2019.01); G16B 20/20 (2019.01); G16B 40/00 (2019.01)
CPC G16B 40/20 (2019.02) [G06F 9/3877 (2013.01); G06F 18/2148 (2023.01); G06F 18/2431 (2023.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01); G16B 20/00 (2019.02); G16B 20/20 (2019.02); G16B 40/00 (2019.02)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
at least one processor; and
a non-transitory computer readable medium storing a convolutional neural network and instructions that, when executed by the at least one processor, cause the system to:
identify a group of reads aligned with a reference genome and spanning a candidate variant at a target base position;
provide, to the convolutional neural network, an array of input features generated from a text file comprising sequencing data output by a sequencer instrument, the array of input features encoding:
bases from the group of reads in the text file at the target base position,
bases flanking each side of the target base position in the text file, and
corresponding base features for bases within the group of reads; and
generate, based on an analysis of the array of input features by the convolutional neural network, classification scores indicating likelihoods that the candidate variant at the target base position is a variant.