| CPC G16B 20/40 (2019.02) [G16B 30/10 (2019.02); G16B 45/00 (2019.02)] | 14 Claims |
|
1. A method for identification of a key driver that is responsible for bringing a change in a microbial population, comprising:
receiving a sample comprising a microbiome from a first set of individuals and a second set of individuals, wherein the first set of individuals is in a reference state and the second set of individuals is in a perturbed state;
extracting Deoxyribonucleic acid (DNA) from the received sample of the first set of individuals and the second set of individuals;
sequencing the extracted DNA corresponding to the first set of individuals and the second set of individuals to generate a plurality of DNA sequences;
filtering and processing the generated plurality of DNA sequences corresponding to the received sample of the first set of individuals and the second set of individuals, wherein the plurality of DNA sequences is processed to remove low quality DNA sequences and non-essential DNA fragments of the generated plurality of DNA sequences;
creating two matrices of a microbial abundance profile of the generated plurality of DNA sequences corresponding to the received sample of the first set of individuals and the second set of individuals, wherein
the microbial abundance profile contains abundance values of each of a plurality of microbes present in the sample of the first set of individual and the second set of individuals,
each matrix of the two matrices of the microbial abundance profile includes abundances of the plurality of microbes corresponding to the sample of the individuals belonging to corresponding to the first set of individuals and the second set of individuals,
the microbial abundance profile comprises abundance values of a plurality of individual taxonomic groups in the generated plurality of DNA sequences corresponding to the plurality of microbes of the sample of the first set of individuals and the second set of individuals,
each matrix includes a plurality of rows and a plurality of columns,
the plurality of rows represents the plurality of individual taxonomic groups,
the plurality of columns represents a presence of the plurality of individual taxonomic groups in the corresponding sample, and
the creation of the two matrices corresponds to identification of counts of all potential microbes across the first set individuals and the second set of individuals using a marker gene survey data or a whole genome sequence data;
filtering the created two matrices to retain information of microbes which are common to the created two matrices corresponding to the first set of individuals and the second set of individuals, wherein the filtration of the created two matrices corresponds to exclusion of microbial data which is not present in the first set individuals and the second set of individuals;
generating a first network and a second network by representing the plurality of microbes in each matrix of the created two matrices as a network of plurality of nodes corresponding to the sample of the first set of individuals and the second set of individuals;
identifying distinct microbial communities from the generated first network and the generated second network;
filtering the first network and the second network to retain a set of nodes common to both the generated first network and the generated second network;
calculating a Jaccard edge index between the generated first network and the generated second network, wherein the Jaccard edge index is calculated using:
![]() where AE and BE represent the edge set in the first network and the second network respectively;
constructing a community shuffling plot using the identified distinct microbial communities, wherein the community shuffling plot highlights changes in the identified distinct microbial communities between the first network and the second network association network;
computing a scaled change in betweenness from the first network to the second network for the plurality of nodes common to both the generated first network and the generated second network, wherein computing the scaled change in betweenness is done using a following formula:
ΔB=Bscaled (B)−Bscaled(A)
where,
![]() Bcalculated, Bmin and Bmax correspond to the calculated, minimum and maximum betweenness values;
calculating a value of coreness for each of the plurality of nodes corresponding to the first network and the second network, wherein the value of coreness indicates an importance of a node of the plurality of nodes in the network;
quantifying the community shuffling and network rewiring based on the community shuffling plot and the calculated Jaccard edge index respectively;
calculating a neighbor shift score for each of the plurality of nodes common to the first network and the second network using a predefined formula, wherein the predefined formula is:
![]() where A and B correspond to the first network and second network generated from each of first and second set of individuals respectively,
[Neighbors]A and [Neighbors]B represent the set of first neighbors of the considered node corresponding to A and B respectively;
identifying whether the filtered network pair:
have undergone community shuffling based on a predefined split in the communities between the first network and the second network using the community shuffling plot and individual community members based on change in the value of coreness, and
have undergone rewiring based on the value of Jaccard edge index; and
identifying a specific node of the plurality of nodes as the key driver from the first network to the second network, wherein
the identification is based on a predefined condition on the values of the neighbor shift score and the scaled change in betweenness, and
the key driver brings a change in the microbial population.
|