| CPC G16B 40/20 (2019.02) [G06N 3/04 (2013.01); G06N 20/00 (2019.01); G16B 20/20 (2019.02); G16B 30/00 (2019.02)] | 30 Claims |

|
1. A computer-implemented method, comprising:
accessing a three-dimensional structure of a reference amino acid sequence of a protein;
mapping atoms of the three-dimensional structure to respective voxels of a three-dimensional voxel grid by associating each atom with an individual voxel of the three-dimensional voxel grid;
defining, for each voxel in the three-dimensional voxel grid on an amino acid basis, amino acid-wise distance channels comprising three-dimensional distance values specifying distances from each respective voxel to corresponding atoms mapped to each respective voxel;
encoding an alternative allele channel to each voxel in the three-dimensional voxel grid, wherein the alternative allele channel is a three-dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide;
encoding an evolutionary conservation channel to each sequence of three-dimensional distance values across the amino acid-wise distance channels on a voxel position basis, wherein the evolutionary conservation channel is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to a corresponding voxel;
applying three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele channel and respective evolutionary conservation channels; and
determining a pathogenicity of the variant nucleotide based at least in part on the tensor.
|