US 12,314,286 B2
	Distributed approximate nearest neighbor service architecture for retrieving items in an embedding space
Guanghua Shu, Sunnyvale, CA (US); Taesik Na, Issaquah, WA (US); Zhihong Xu, Sunnyvale, CA (US); Wideet Shende, San Francisco, CA (US); Manmeet Singh, Santa Clara, CA (US); Tejaswi Tenneti, Fremont, CA (US); and Reza Sadri, Irvine, CA (US)
Assigned to Maplebear Inc., San Francisco, CA (US)
Filed by Maplebear Inc., San Francisco, CA (US)
Filed on Feb. 28, 2022, as Appl. No. 17/682,187.
Prior Publication US 2023/0273940 A1, Aug. 31, 2023
Int. Cl. G06F 16/28 (2019.01); G06F 11/34 (2006.01); G06F 16/22 (2019.01); G06F 16/24 (2019.01); G06F 16/245 (2019.01); G06F 16/2455 (2019.01)

CPC G06F 16/283 (2019.01) [G06F 11/3409 (2013.01); G06F 16/2228 (2019.01); G06F 16/24556 (2019.01); G06F 16/285 (2019.01)]

17 Claims

1. A method comprising:

generating item embeddings for each of a plurality of items maintained in an item database by an online system, each item embedding representing an item in a latent space of a neural network, each item embedding being a latent space vector generated by the neural network;

storing values of a specific attribute in the item database, wherein each value of the specific attribute is associated with one of the plurality of items maintained in the item database;

generating a plurality of indices, each index corresponding to a particular value of the specific attribute and including item embeddings for items having the particular value for the specific attribute, different indices corresponding to different particular values;

distributing the plurality of indices across a plurality of shards to increase scalability of storing the item embeddings, wherein distributing the plurality of indices across the plurality of shards comprises:

determining frequencies with which the plurality of indices are accessed by the online system in response to item queries received by the online system;

selecting a shard to include an index of the plurality of indices based on the frequencies of the plurality of indices to load balance accesses to the plurality of shards, wherein selecting the shard to include the index of the plurality of indices comprises:

determining an aggregate frequency of access to the indices by combining frequencies with which each index was accessed,

determining a target frequency of access for each shard as a ratio of the aggregate frequency of access to a number of shards, and

selecting the shard to include the index of the plurality of indices so a combination of frequencies with which the online system accesses indices within the shard is within a threshold amount of the target frequency of access; and

storing the index of the plurality of indices in the selected shard;

storing the plurality of shards; and

receiving requests for retrieving a plurality of the item embeddings based on the plurality of indices, wherein retrieving the plurality of the item embeddings comprises retrieving the plurality of shards that are load balanced based on the frequencies of the plurality of indices.