US 11,947,551 B2
	Automated sampling of query results for training of a query engine
Nicholas Cooley, Oakland, CA (US)
Assigned to Maplebear Inc., San Francisco, CA (US)
Filed by Maplebear Inc., San Francisco, CA (US)
Filed on May 27, 2022, as Appl. No. 17/826,162.
Prior Publication US 2024/0004886 A1, Jan. 4, 2024
Int. Cl. G06F 16/2455 (2019.01); G06F 16/2453 (2019.01); G06F 16/2457 (2019.01)

CPC G06F 16/24578 (2019.01) [G06F 16/24542 (2019.01); G06F 16/2455 (2019.01)]

20 Claims

1. A method, comprising:

at a computer system comprising at least one processor and memory:

receiving a plurality of search queries from a plurality of users directed at an item query engine of an online system, each search query including a search phrase used by a user to conduct the search query;

monitoring the item query engine for a period of time;

storing the plurality of search queries received within the period of time as a set of historical query records associated with the item query engine, the set of historical query records comprising a plurality of search phrases, wherein each historical query record of the set of historical query records is associated with a search phrase and a list of items returned by the item query engine;

determining search frequencies of the search phrases used in the historical query records for the period of time, wherein a search frequency for a given search phrase is a number of times the given search phrase used by the plurality of users within the period of time;

stratifying the set of historical query records into a plurality of bins according to the search frequencies of the search phrases, wherein each bin of the plurality of bins includes a subset of historical query records, wherein each bin defines a range of numbers of times a search phrase is used by the plurality of users within the period of time;

sampling the historical query records from the plurality of bins to collect a representative set of historical query records;

outputting the representative set of historical query records for rating; and

adjusting the item query engine based on the representative set.