US 12,142,349 B2
	Method and system for identification of key driver organisms from microbiome / metagenomics studies
Sharmila Shekhar Mande, Pune (IN); and Kuntal Kumar Bhusan, Pune (IN)
Assigned to Tata Consultancy Services Limited, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on May 14, 2019, as Appl. No. 16/411,849.
Claims priority of application No. 201823018040 (IN), filed on May 14, 2018.
Prior Publication US 2019/0348150 A1, Nov. 14, 2019
This patent is subject to a terminal disclaimer.
Int. Cl. G16B 20/40 (2019.01); G16B 30/10 (2019.01); G16B 45/00 (2019.01)

CPC G16B 20/40 (2019.02) [G16B 30/10 (2019.02); G16B 45/00 (2019.02)]

14 Claims

1. A method for identification of a key driver that is responsible for bringing a change in a microbial population, comprising:

receiving a sample comprising a microbiome from a first set of individuals and a second set of individuals, wherein the first set of individuals is in a reference state and the second set of individuals is in a perturbed state;

extracting Deoxyribonucleic acid (DNA) from the received sample of the first set of individuals and the second set of individuals;

sequencing the extracted DNA corresponding to the first set of individuals and the second set of individuals to generate a plurality of DNA sequences;

filtering and processing the generated plurality of DNA sequences corresponding to the received sample of the first set of individuals and the second set of individuals, wherein the plurality of DNA sequences is processed to remove low quality DNA sequences and non-essential DNA fragments of the generated plurality of DNA sequences;

creating two matrices of a microbial abundance profile of the generated plurality of DNA sequences corresponding to the received sample of the first set of individuals and the second set of individuals, wherein

the microbial abundance profile contains abundance values of each of a plurality of microbes present in the sample of the first set of individual and the second set of individuals,

each matrix of the two matrices of the microbial abundance profile includes abundances of the plurality of microbes corresponding to the sample of the individuals belonging to corresponding to the first set of individuals and the second set of individuals,

the microbial abundance profile comprises abundance values of a plurality of individual taxonomic groups in the generated plurality of DNA sequences corresponding to the plurality of microbes of the sample of the first set of individuals and the second set of individuals,

each matrix includes a plurality of rows and a plurality of columns,

the plurality of rows represents the plurality of individual taxonomic groups,

the plurality of columns represents a presence of the plurality of individual taxonomic groups in the corresponding sample, and

the creation of the two matrices corresponds to identification of counts of all potential microbes across the first set individuals and the second set of individuals using a marker gene survey data or a whole genome sequence data;

filtering the created two matrices to retain information of microbes which are common to the created two matrices corresponding to the first set of individuals and the second set of individuals, wherein the filtration of the created two matrices corresponds to exclusion of microbial data which is not present in the first set individuals and the second set of individuals;

generating a first network and a second network by representing the plurality of microbes in each matrix of the created two matrices as a network of plurality of nodes corresponding to the sample of the first set of individuals and the second set of individuals;

identifying distinct microbial communities from the generated first network and the generated second network;

filtering the first network and the second network to retain a set of nodes common to both the generated first network and the generated second network;

calculating a Jaccard edge index between the generated first network and the generated second network, wherein the Jaccard edge index is calculated using:

where A_Eand B_Erepresent the edge set in the first network and the second network respectively;

constructing a community shuffling plot using the identified distinct microbial communities, wherein the community shuffling plot highlights changes in the identified distinct microbial communities between the first network and the second network association network;

computing a scaled change in betweenness from the first network to the second network for the plurality of nodes common to both the generated first network and the generated second network, wherein computing the scaled change in betweenness is done using a following formula:

ΔB=B_{scaled (B)}−B_scaled(A)

where,

B_calculated, B_minand B_maxcorrespond to the calculated, minimum and maximum betweenness values;

calculating a value of coreness for each of the plurality of nodes corresponding to the first network and the second network, wherein the value of coreness indicates an importance of a node of the plurality of nodes in the network;

quantifying the community shuffling and network rewiring based on the community shuffling plot and the calculated Jaccard edge index respectively;

calculating a neighbor shift score for each of the plurality of nodes common to the first network and the second network using a predefined formula, wherein the predefined formula is:

where A and B correspond to the first network and second network generated from each of first and second set of individuals respectively,

[Neighbors]^Aand [Neighbors]^Brepresent the set of first neighbors of the considered node corresponding to A and B respectively;

identifying whether the filtered network pair:

have undergone community shuffling based on a predefined split in the communities between the first network and the second network using the community shuffling plot and individual community members based on change in the value of coreness, and

have undergone rewiring based on the value of Jaccard edge index; and

identifying a specific node of the plurality of nodes as the key driver from the first network to the second network, wherein

the identification is based on a predefined condition on the values of the neighbor shift score and the scaled change in betweenness, and

the key driver brings a change in the microbial population.