US 11,734,285 B2
System and method for top-k searching using parallel processing
Edward Bortnikov, Haifa (IL); David Carmel, Haifa (IL); Gali Sheffi, Kiryat Bialik (IL); Idit Keidar, Haifa (IL); and Dmitry Basin, Haifa (IL)
Assigned to Verizon Patent and Licensing Inc., Basking Ridge, NJ (US)
Filed by VERIZON PATENT AND LICENSING INC., Basking Ridge, NJ (US)
Filed on Mar. 22, 2018, as Appl. No. 15/928,723.
Prior Publication US 2019/0294691 A1, Sep. 26, 2019
Int. Cl. G06F 16/2457 (2019.01); G06F 16/93 (2019.01); G06F 16/248 (2019.01); G06F 16/951 (2019.01)
CPC G06F 16/24578 (2019.01) [G06F 16/248 (2019.01); G06F 16/93 (2019.01); G06F 16/951 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for retrieving documents for a search, the method being implemented on a computing device comprising a plurality of processors, memory, and a communication platform connected to a network, the method comprising:
receiving a query comprising a plurality of terms;
obtaining, for each of the plurality of terms, a posting list of one or more content items ranked based on term scores associated with a corresponding one or more content items, wherein a term score is indicative of a level of relevance between a corresponding content item in the posting list and the term;
generating a candidate list of content items based on the plurality of posting lists, wherein the step of generating the candidate list comprises:
selecting, from each of the posting lists, a first content item having a rank with a same first value in each of the posting lists,
retrieving, for each of the first content items, from the respective posting lists, an associated term score of the first content item, and
creating the candidate list with the first content items that are ranked based on their corresponding term scores;
updating the candidate list by:
selecting, from each of the posting lists, a next content item having a next rank with a value lower than a previous rank value,
for each of the next content items,
if the next content item is not in the candidate list, inserting a new entry in the candidate list for the next content item with its content item identifier and its corresponding term score, and
if the next content item is in the candidate list,
summing all available term scores associated with the next content item in all of the posting lists, wherein the rank values associated with the all available term scores in all of the posting lists are equal or higher than the next rank value, and
re-ranking the candidate list based on the summed term scores of the next content item, and
repeating the step of selecting, inserting, summing, and re-ranking until the candidate list has been updated based on all of the one or more content items in each of the posting lists;
determining, based on the candidate list, the candidate content items; and
providing, based on the candidate content items, a response to the query.