US 11,816,118 B2
	Collaborative dataset consolidation via distributed computer networks
Bryon Kristen Jacob, Austin, TX (US); Jon Loyens, Austin, TX (US); David Lee Griffith, Austin, TX (US); Brett A. Hurt, Austin, TX (US); Triet Minh Le, Austin, TX (US); Shad William Reynolds, Austin, TX (US); Arthur Albert Keen, Austin, TX (US); Joseph Boutros, Austin, TX (US); and Alexander John Zelenak, Austin, TX (US)
Assigned to data.world, Inc., Austin, TX (US)
Filed by data.world, Inc., Austin, TX (US)
Filed on Aug. 22, 2022, as Appl. No. 17/893,100.
Application 17/893,100 is a continuation of application No. 17/037,005, filed on Sep. 29, 2020, granted, now 11,423,039.
Application 17/037,005 is a continuation of application No. 16/120,057, filed on Aug. 31, 2018, granted, now 10,853,376.
Application 16/120,057 is a continuation of application No. 15/186,514, filed on Jun. 19, 2016, granted, now 10,102,258.
Prior Publication US 2023/0153312 A1, May 18, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/2458 (2019.01); G06F 16/25 (2019.01); G06N 5/04 (2023.01); G06F 16/215 (2019.01)

CPC G06F 16/2471 (2019.01) [G06F 16/215 (2019.01); G06F 16/2465 (2019.01); G06F 16/252 (2019.01); G06F 16/256 (2019.01); G06F 16/258 (2019.01); G06N 5/04 (2013.01)]

20 Claims

1. A method comprising:

formatting a dataset to form a first atomized dataset including graph-based data associated with metadata including attributes of the dataset, the first atomized dataset being a first version;

forming a second atomized dataset including the first atomized dataset, the second atomized dataset including changed data as a second version from the first version;

receiving data representing a query;

re-writing the query to generate one or more sub-queries configured to access to the second version of the dataset including at least a portion of data stored in at least one of a different data repositories to perform a federated query;

classifying query portions of the one or more sub-queries to identify a classification type for a query portion;

applying the one or more sub-queries to the different data repositories; and

retrieving data responsive to the data representing the query.