US 11,749,412 B2
Distributed data analytics
Patricia Gomes Soares Florissi, Briarcliff Manor, NY (US); Michal Ziv Ukelson, Lehavim (IL); Ran Dach, Kiryat Yam (IL); Arnon Benshahar, Tel Aviv (IL); and Ehud Gudes, Beer-Sheva (IL)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Jul. 6, 2020, as Appl. No. 16/921,303.
Application 16/921,303 is a continuation of application No. 15/719,231, filed on Sep. 28, 2017, granted, now 10,706,970.
Application 15/719,231 is a continuation in part of application No. 15/281,248, filed on Sep. 30, 2016, granted, now 10,528,875, issued on Jan. 7, 2020.
Application 15/281,248 is a continuation in part of application No. 14/983,932, filed on Dec. 30, 2015, granted, now 10,311,363, issued on Jun. 4, 2019.
Claims priority of provisional application 62/400,767, filed on Sep. 28, 2016.
Claims priority of provisional application 62/143,685, filed on Apr. 6, 2015.
Claims priority of provisional application 62/143,404, filed on Apr. 6, 2015.
Prior Publication US 2020/0335223 A1, Oct. 22, 2020
This patent is subject to a terminal disclaimer.
Int. Cl. G01N 33/48 (2006.01); G01N 33/50 (2006.01); G16H 50/70 (2018.01); G16H 50/20 (2018.01); G06F 16/182 (2019.01); G06F 16/435 (2019.01); G06Q 50/22 (2018.01)
CPC G16H 50/70 (2018.01) [G06F 16/182 (2019.01); G06F 16/436 (2019.01); G16H 50/20 (2018.01); G06Q 50/22 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
obtaining reads of biological samples of respective sample sources wherein each of the biological samples contains genomic material from a plurality of distinct microorganisms within an environment of a corresponding one of the sample sources; and
performing distributed data analytics to characterize an actual or potential outbreak of at least one of a disease, an infection and a contamination that involves genomic material from multiple ones of the distinct microorganisms in one or more of the sample sources;
wherein performing distributed data analytics comprises:
performing local analytics in respective ones of a plurality of data zones; and
performing global analytics utilizing results of the local analytics performed in the respective data zones;
wherein each of the data zones comprises one or more sequencing centers utilized to generate a corresponding subset of the reads within that data zone;
wherein the local analytics performed in a given one of the data zones utilize reads of one or more of the biological samples sequenced in the one or more sequencing centers of the given data zone;
wherein the local analytics performed in the given data zone comprise analyzing the reads of the one or more biological samples against a local set of known gene units;
wherein at least one result of the global analytics comprises a graph in which nodes correspond to respective biological samples and edges between the nodes characterize epidemiological relationships between the biological samples, the edges being weighted by sample-to-sample comparison scores of metagenomics sequencing results for the biological samples; and
wherein the method is implemented by a distributed data processing system comprising a plurality of processing devices configured to communicate with one another over at least one network.