US 12,217,829 B2
Artificial intelligence-based analysis of protein three-dimensional (3D) structures
Tobias Hamp, Essex (GB); Hong Gao, Palo Alto, CA (US); and Kai-How Farh, San Mateo, CA (US)
Assigned to Illumina, Inc., San Diego, CA (US)
Filed by Illumina, Inc., San Diego, CA (US)
Filed on Apr. 15, 2021, as Appl. No. 17/232,056.
Prior Publication US 2022/0336054 A1, Oct. 20, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G16B 40/20 (2019.01); G06N 3/04 (2023.01); G06N 20/00 (2019.01); G16B 20/20 (2019.01); G16B 30/00 (2019.01)
CPC G16B 40/20 (2019.02) [G06N 3/04 (2013.01); G06N 20/00 (2019.01); G16B 20/20 (2019.02); G16B 30/00 (2019.02)] 30 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
accessing a three-dimensional structure of a reference amino acid sequence of a protein;
mapping atoms of the three-dimensional structure to respective voxels of a three-dimensional voxel grid by associating each atom with an individual voxel of the three-dimensional voxel grid;
defining, for each voxel in the three-dimensional voxel grid on an amino acid basis, amino acid-wise distance channels comprising three-dimensional distance values specifying distances from each respective voxel to corresponding atoms mapped to each respective voxel;
encoding an alternative allele channel to each voxel in the three-dimensional voxel grid, wherein the alternative allele channel is a three-dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide;
encoding an evolutionary conservation channel to each sequence of three-dimensional distance values across the amino acid-wise distance channels on a voxel position basis, wherein the evolutionary conservation channel is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to a corresponding voxel;
applying three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele channel and respective evolutionary conservation channels; and
determining a pathogenicity of the variant nucleotide based at least in part on the tensor.