US 12,130,790 B1
Method for accelerated long document search using Hilbert curve mapping
Muhammad Saleh Saeed Al-Qurishi, Riyadh (SA); and Riad Souissi, Riyadh (SA)
Assigned to ELM, Riyadh (SA)
Filed by ELM, Riyadh (SA)
Filed on Jan. 22, 2024, as Appl. No. 18/418,948.
Claims priority of provisional application 63/514,579, filed on Jul. 20, 2023.
Int. Cl. G06F 16/22 (2019.01); G06F 16/93 (2019.01); G06F 16/953 (2019.01)
CPC G06F 16/2237 (2019.01) [G06F 16/2264 (2019.01); G06F 16/93 (2019.01); G06F 16/953 (2019.01)] 15 Claims
OG exemplary drawing
 
1. A method for a textual document search engine, comprising:
initializing the textual document search engine, by
inputting, into a memory, a plurality of documents,
wherein each document of the plurality of documents has a plurality of sentences, and each sentence of each document has m semantic embedding vectors, where m is an integer greater than 1;
mapping, via a processing circuitry, the m semantic embedding vectors for each document of the plurality of documents to 1-dimensional vectors of Hilbert numbers using a Hilbert curve transformation;
constructing, via the processing circuitry, an index table with the plurality of 1-dimensional vectors; and
storing the index table in the memory,
wherein, in the mapping, the processing circuitry performs the Hilbert curve transformation by converting m embedding vectors into a Hilbert number and the Hilbert numbers are search keys in the index table,
the method further comprising:
performing a search using the initialized textual document search engine, by
receiving a query document, which has query embedding vectors;
searching, via the processing circuitry, the search keys in the index table; and
performing, via the processing circuitry, a filtration stage,
wherein the searching, via the processing circuitry, includes:
mapping the query embedding vectors into Hilbert numbers for the query document; and
performing a binary search based on the Hilbert numbers, and
wherein the performing the filtration stage, via the processing circuitry, includes:
outputting a predetermined number of candidate documents that are similar to the query document.