US 12,346,340 B2
Search engine using self-supervised learning and predictive models for searches based on partial information
Amihai Savir, Newton, MA (US); Ofir Ezrielev, Be'er Sheva (IL); and Oshry Ben Harush, Cedar Park, TX (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Jul. 22, 2022, as Appl. No. 17/871,843.
Application 17/871,843 is a continuation in part of application No. 17/711,839, filed on Apr. 1, 2022, granted, now 12,141,158.
Application 17/871,843 is a continuation in part of application No. 15/084,324, filed on Mar. 29, 2016, abandoned.
Prior Publication US 2022/0374446 A1, Nov. 24, 2022
Int. Cl. G06F 17/00 (2019.01); G06F 16/25 (2019.01); G06F 16/901 (2019.01); G06F 16/9535 (2019.01)
CPC G06F 16/256 (2019.01) [G06F 16/9024 (2019.01); G06F 16/9535 (2019.01)] 19 Claims
OG exemplary drawing
 
1. A server computer-implemented method of processing queries input to a data retrieval system storing data assets for users in an enterprise, comprising:
storing, in a federation business data lake (FBDL) storage maintained for a large-scale data processing system, data assets retrievable by a user;
providing a search engine for entry of queries by users looking for data in the FBDL;
monitoring and recording, by a monitoring component of the server, all interactions of a plurality of known users, including a first user and a target user, each interaction comprising an activity that triggers a read/write cycle to the FBDL storage;
first deriving a similarity of each of the plurality of known users to the target user based on respective past and current data retrieval patterns of each of known users for data queried in the search engine;
identifying an unknown user for whom there are no known interactions with the plurality of known users or the data assets to constitute missing features;
generating a graph for the unknown user representing data asset interactions for the unknown user;
training a generative model that uses reconstructive self-supervised learning (SSL) techniques for the graph to generate possible values for the missing features;
second deriving a similarity of the unknown user to the target user based on the trained model; and
returning a result to a query input to the search engine by the target user based on the similarity of the known users to the target user and the similarity of the unknown user to the target user.