US 12,353,445 B2
Feature store with integrated tracking
Mani Parkhe, San Jose, CA (US); Clemens Mewald, Lafayette, CA (US); Matei Zaharia, Palo Alto, CA (US); and Avesh Singh, San Francisco, CA (US)
Assigned to Databricks, Inc., San Francisco, CA (US)
Filed by Databricks, Inc., San Francisco, CA (US)
Filed on Oct. 29, 2021, as Appl. No. 17/514,997.
Claims priority of provisional application 63/191,705, filed on May 21, 2021.
Prior Publication US 2022/0374457 A1, Nov. 24, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/28 (2019.01); G06F 30/27 (2020.01)
CPC G06F 16/288 (2019.01) [G06F 30/27 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
one or more processors, and:
one or more memories comprising stored instructions that when executed by the one or more processors cause the system to:
access one or more datasets stored on a data store;
determine a feature based at least in part on the one or more datasets;
store the feature in a feature store, wherein the feature store stores a first set of features and a second set of features;
store, in the feature store, metadata in association with the feature, the metadata including a mapping from the feature to upstream lineage data indicating the one or more datasets used to determine the feature;
receive a request to access the first set of features for training a first model, wherein the first set of features includes the feature;
provide the first set of features to the first model at a first latency;
determine a model serving endpoint that deploys the first model, wherein the model serving endpoint is a web service;
store, in the feature store, the metadata in association with the feature, the metadata further including a mapping from the feature to downstream lineage data indicating the first model and the model serving endpoint that deploys the first model;
receive an update indication that the feature is to be updated;
determine, responsive to receipt of the update indication and based on the metadata including the mapping from the feature to the downstream lineage data, the model serving endpoint that deploys the first model trained on the feature;
transmit a notification to the model serving endpoint, the notification indicating that the feature is updated;
receive a request to access the second set of features for performing inference using a trained second model; and
provide the second set of features to the trained second model at a second latency, wherein the second latency is lower than the first latency.