US 11,704,299 B1
Fully managed repository to create, version, and share curated data for machine learning development
Tanya Bansal, Seattle, WA (US); Vidhi Kastuar, Fremont, CA (US); Saurabh Gupta, Sammamish, WA (US); Alex Tang, Shoreline, WA (US); Lakshmi Naarayanan Ramakrishnan, Redmond, WA (US); Stefano Stefani, Issaquah, WA (US); Xingyuan Wang, Seattle, WA (US); and Mukesh Karki, Bellevue, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Mar. 18, 2021, as Appl. No. 17/205,373.
Claims priority of provisional application 63/119,259, filed on Nov. 30, 2020.
Int. Cl. G06F 7/00 (2006.01); G06F 16/22 (2019.01); G06F 21/60 (2013.01); G06F 16/21 (2019.01); G06F 16/25 (2019.01); G06N 20/00 (2019.01); G06F 18/214 (2023.01)
CPC G06F 16/2291 (2019.01) [G06F 16/219 (2019.01); G06F 16/252 (2019.01); G06F 18/214 (2023.01); G06F 21/602 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, from a customer account registered with a service provider, a request to create a feature group comprising one or more features, the one or more features including at least an identifier feature for identifying the feature group;
creating a first database associated with the customer account, the first database configured to store data representing a most recent record including values associated with the one or more features in the feature group;
creating a second database associated with the customer account, the second database configured to store data representing one or more records including values associated with the one or more features in the feature group;
receiving, from the customer account at a first time, first data including the identifier feature and first values associated with the one or more features;
storing, in the first database, the first values associated with the one or more features as the most recent record at the first time;
storing, in the second database, the first values associated with the one or more features as a first record at the first time;
receiving, from the customer account at a second time, second data including the identifier feature and second values associated with the one or more features;
determining that the second time is more recent than the first time;
storing, in the first database and based at least in part on determining that the second time is more recent than the first time, the second values associated with the one or more features as the most recent record at the second time, wherein storing the second values comprises overwriting the first values in the first database; and
storing, in the second database, the second values as a second record at the second time, wherein storing the second record comprises appending the second record to the second database.