CPC G16B 50/00 (2019.02) [G06F 16/24578 (2019.01); G06F 16/9024 (2019.01); G16B 30/00 (2019.02)] | 20 Claims |
1. A parallel-processing graph-database system for protein-sequence analytics to determine a viable therapeutic for a given condition, comprising:
at least one processor; and
memory including instructions that when executed cause the at least one processor to:
receive, from a user, a query including at least a segment of a protein sequence of the given condition, wherein the query comprises one or more domain specific functions,
build a sequence database to compare a query sequence of the protein sequence of the given condition with sequences of other known proteins in the sequence database by ingesting data from a file system and converting the data for sequence mapping,
use the sequence database to determine a similarity of the query sequence with sequences of the other known proteins in the sequence database by performing a first domain specific function of the query to conduct protein similarity analysis,
perform a second domain specific function of the query to:
determine respective similarity scores based on the similarity of the sequences of the other known proteins with the query sequence, and
identify proteins of the sequences of the other known proteins having a similarity score above a determined threshold, and
identify one or more therapeutics associated with the identified proteins by querying a parallel-processing graph database that comprises potential therapeutics, associated with the identified proteins, that could have an inhibitory effect on the given condition, wherein the query of the parallel-processing graph database includes the identified proteins, and
return a sorted list of at least a subset of the identified therapeutics associated with the identified proteins, wherein the subset of the identified therapeutics of the sorted list are sorted according the similarity scores of the identified proteins,
wherein the one or more domain specific functions are executed in parallel.
|