CPC C12N 15/8202 (2013.01) | 14 Claims |
1. A method for making a synthetic promoter for controlling transgene expression, the method comprising:
(a) accessing a database including:
(i) genomic data representative of multiple genes, wherein the multiple genes are genes from at least one plant, and wherein the genomic data include a gene sequence for each of the multiple genes, and
(ii) gene expression property data indicative of the presence of one or more gene expression property for ones of the multiple genes;
(b) selecting, by a processor, a first set of gene sequences from the multiple genes, based on the gene expression property data indicating the one or more gene expression property is present for each gene in the first set of gene sequences;
(c) extracting, by the processor, from each gene sequence in the first set of gene sequences, a promoter sequence A;
(d) selecting, by the processor, a second set of gene sequences from the multiple genes, based on the gene expression property data indicating the one or more gene expression property is absent for each gene in the second set of gene sequences;
(e) extracting, by the processor, from each gene sequence in the second set of gene sequences, a promoter sequence B;
(f) aligning, by the processor, the promoter sequences A and B based on a landmark into a sequence alignment, the landmark including one of a TATA box and a transcription start site (TSS);
(g) selecting, by the processor, a test promoter sequence S;
(h) calculating, by the processor, a score, using the sequence alignment, for the test promoter sequence S, based on a scoring function, as:
![]() wherein G is a union of the first and second sets of promoter sequences A and B, and for each position i relative to the landmark and each k-mer k, Gk,i are sequences in G that contain k-mer k at position i;
(i) modifying, by the processor, the test promoter sequence S, at random, to form a modified test promoter sequence S′;
(j) calculating, by the processor, a score for the modified test promoter sequence S′, based on said scoring function; and
(k) in response to the calculated score for the modified test promoter sequence S′ being improved over the calculated score for the test promoter sequence S, storing, by the processor, in a memory, the modified test promoter sequence S′, thereby permitting synthesis of the modified test promoter sequence S′.
|