US 11,809,423 B2
Method and system for interactive keyword optimization for opaque search engines
Rami Puzis, Ashdod (IL); Aviad Elyashar, Beer Sheva (IL); and Maor Reuben, Haifa (IL)
Assigned to G. Negev Technologies and Applications Ltd., at Ben-Gurion University, Beer Sheva (IL)
Filed by B. G. NEGEV TECHNOLOGIES AND APPLICATIONS LTD., AT BEN-GURION UNIVERSITY, Beer Sheva (IL)
Filed on Jun. 30, 2022, as Appl. No. 17/854,917.
Application 17/854,917 is a continuation of application No. 16/840,538, filed on Apr. 6, 2020, granted, now 11,397,731.
Claims priority of provisional application 62/830,474, filed on Apr. 7, 2019.
Prior Publication US 2022/0358122 A1, Nov. 10, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/2453 (2019.01); G06F 16/953 (2019.01)
CPC G06F 16/2453 (2019.01) [G06F 16/953 (2019.01)] 8 Claims
OG exemplary drawing
 
1. An automated interactive optimization method of short keyword queries for improving information retrieval from opaque (black box) search engines, comprising:
a) collecting data including labeled claims from several fact-checking websites, for creating dataset which is used for evaluation;
b) estimating the relevance of posts/query results retrieved from a search engine to a given input document, by calculating the mean relevance error (MRE), based on estimating the minimal distance between words comprising both the retrieved posts and the input document;
c) labeling a subset of claims for evaluation, by choosing a number of claims that gained the maximal and the minimal mean relevance error (MRE); and
d) finding the most appropriate queries in order to retrieve the maximal number of relevant posts using an opaque search engine, by performing an interactive greedy search for the best word that should be added to the input query, for maximizing the corresponding posts retrieved by the search engine;
wherein calculating the mean relevance error (MRE) is performed by estimating the minimal distance between vector representations of words in a retrieved post and the words in the given input document, including the following steps:
e) removing stop-words from the input document and the retrieved posts;
f) defining the mean relevance error (MRE) as a function, which receives as an input a document d and a collection of posts P retrieved from the search engine and outputs a number, where the lower the MRE, the more relevant are the retrieved posts P to the underlined document d;
g) calculating the distance between vector representations of two words as a measure of similarity between them, wherein vector representations of words are derived using a word embedding model;
h) defining the distance between a word wi and a document d as the minimal distance between a word wi and all the words in the set of words in the input document d, defined as Wd;
i) averaging the distances of all words wi∈Wp, which defines as the set of words in p∈P, to the document d, for calculating the distance of a post p from document d; and
j) defining the mean relevance error (MRE) of the collection P to the document d as the average distance of all posts in P from document d and calculating said MRE.