CPC G16B 5/20 (2019.02) [C12N 15/10 (2013.01); C12N 15/1089 (2013.01); C12Q 1/68 (2013.01); G16B 5/00 (2019.02); G16B 15/00 (2019.02); G16B 20/00 (2019.02); G16B 20/20 (2019.02); G16B 20/30 (2019.02); G16B 20/50 (2019.02); G16B 40/00 (2019.02); G16B 40/10 (2019.02); G16B 99/00 (2019.02)] | 15 Claims |
1. A computer-implemented method for identifying a first natural product of interest that is producible by an organism from a polynucleotide sequence resident in the organism, the method comprising automatically executing, using one or more digital processors of a computing platform, a bioinformatics tool for:
(a) receiving as input into the bioinformatics tool the polynucleotide sequence to:
(i) identify a given gene cluster defined within the polynucleotide sequence; and
(ii) determine a set of gene cluster-encoded chemical monomers encoded by said given gene cluster;
(b) aligning, using a molecular alignment engine in the bioinformatics tool, said set of gene cluster-encoded chemical monomers with a set of deconstructed chemical monomers determined by deconstruction of a plurality of known second natural products;
(c) calculating a similarity score based on a computed molecular similarity between said gene cluster-encoded chemical monomers and said deconstructed chemical monomers, given said aligning, for each of the known second natural products, wherein said similarity score is based at least in part on a chemical structure most closely correlated to said given gene cluster;
(d) associating a highest-scoring one of said known second natural products having a known molecular structure of interest with said given gene cluster manifesting a most similar chemical structure; and
(e) directing, as an output, synthesis of the first natural product from said set of gene cluster-encoded chemical monomers encoded by said given gene cluster and investigation of the first natural product to determine that the first natural product is of interest for having chemical properties similar to that of said highest-scoring one of said second known natural products;
wherein the bioinformatics tool defines a digital interface to receive as input the polynucleotide sequence and output said output.
|