US 12,265,590 B1
	Systems and methods for enhancing performance of search engines
Hotaka Shiokawa, Waltham, MA (US); Hui Su, West Roxbury, MA (US); and Hamed Shahbazi, Norwood, MA (US)
Assigned to FMR LLC, Boston, MA (US)
Filed by FMR LLC, Boston, MA (US)
Filed on Jun. 13, 2024, as Appl. No. 18/742,418.
Int. Cl. G06F 7/00 (2006.01); G06F 16/957 (2019.01); G06F 40/295 (2020.01)

CPC G06F 16/9574 (2019.01) [G06F 40/295 (2020.01)]

22 Claims

1. A system for enhancing performance of a search engine, the system comprising a server computing device having a memory for storing computer executable instructions and a processor that executes the computer executable instructions to:

perform preprocessing on one or more chunks in a chunk database, the preprocessing comprising:

dividing a total number of sentences in each large chunk into two or more sentence sets, wherein each large chunk is a chunk that includes a total number of sentences that exceed a predetermined threshold, and wherein each sentence set includes one or more sentences; and

generating, for each large chunk, a modified sentence set for each sentence set in the large chunk, wherein the modified sentence set is generated by modifying the respective sentence set according to an alteration technique associated with one or more alteration techniques;

receive search results from a search engine based on a query, in which the search results include chunk identifiers and a first weight score associated with each chunk identifier, wherein each chunk identifier corresponds to a chunk and one or more modified chunks, and wherein each of the one or more modified chunks are modified according to a different alteration technique;

determine a subset of search results, which includes a first predetermined number of top-ranked chunk identifiers in the search results that are determined by the search engine to be the most relevant to the query;

generate, for each chunk identifier in the subset of search results, one or more second weight scores based on an embedding of the query and embeddings of the one or more modified chunks corresponding to the chunk identifier, wherein a second weight score is generated for each modified chunk associated with the chunk, and wherein, in a case that the chunk associated with the chunk identifier is a large chunk, the one or more second weight scores are generated based on the embedding of the query and embeddings of one or more modified sentence sets corresponding to the large chunk;

generate, via a machine learning model, ensemble scores for the chunk identifiers, wherein each ensemble score for a chunk identifier is determined based on the first weight score, the one or more second weight scores, and trainable weights of the machine learning model; and

determine a second predetermined number of top-ranked chunk identifiers based on the ensemble scores, wherein the second predetermined number is less than the first predetermined number.