US 11,055,370 B2
Method for automatically constructing inter-language queries for a search engine
Guillaume Wenzek, Paris (FR); Jocelyn Coulmance, Paris (FR); and Jean-Marc Marty, Paris (FR)
Assigned to PROXEM, Paris (FR)
Appl. No. 15/757,649
Filed by PROXEM, Paris (FR)
PCT Filed Sep. 6, 2016, PCT No. PCT/EP2016/070971
§ 371(c)(1), (2) Date Mar. 5, 2018,
PCT Pub. No. WO2017/042161, PCT Pub. Date Mar. 16, 2017.
Claims priority of application No. 1558249 (FR), filed on Sep. 7, 2015.
Prior Publication US 2019/0026371 A1, Jan. 24, 2019
Int. Cl. G06F 16/9535 (2019.01); G06F 16/33 (2019.01); G06F 16/332 (2019.01); G06F 16/338 (2019.01); G06F 40/30 (2020.01); G06F 40/45 (2020.01); G06F 40/58 (2020.01)
CPC G06F 16/9535 (2019.01) [G06F 16/338 (2019.01); G06F 16/3329 (2019.01); G06F 16/3334 (2019.01); G06F 16/3337 (2019.01); G06F 16/3347 (2019.01); G06F 40/30 (2020.01); G06F 40/45 (2020.01); G06F 40/58 (2020.01)] 10 Claims
OG exemplary drawing
 
1. A method for automatically constructing inter-language queries executed by a search engine from a text file containing a learning corpus (C) comprising all sentences correspondingly expressed in at least two languages, words of each of said two languages being each associated with a target vector (w), the method comprising steps of:
aligning target vectors (we, wf) of words of said learning corpus (C) in said at least two languages;
recovering N words from each of said at least two languages having closest target vectors (w) to a target vector associated with a query word;
constructing queries from the N words previously recovered from said at least two languages;
executing queries by the search engine;
displaying results returned by the search engine; and
filtering a meaning of said query word among several meanings by:
determining M closest target vectors (w) to the target vector associated with said query word;
selecting a closest target vector (w) corresponding to the meaning of said query word to be filtered;
subtracting the closest target vector selected to the target vector associated with said query word;
wherein each word of said learning corpus (C) being associated with a target vector (w) and a context vector (custom character); and
wherein the step of aligning the target vectors (we, wf) comprises steps of:
calculating intra-language cost functions (Je; Jf) to calculate the target vectors (w) and the context vector (custom character) in each of said two languages;
calculating inter-language cost functions (Ωe,f, Ωf,e) respectively to align targets vectors (we) of words in a first language (e) with respect to the context vectors custom character of words in a second language (f), and to align target vectors (wf) of the words in the second language (f) with respect to the context vectors custom character of the words in the first language (e), and
minimizing a sum of at least four cost functions (Je; Jf; Ωe,f; Ωf,e) previously calculated.