US 12,222,950 B2
Search in a data marketplace
Orestis Kostakis, Redmond, WA (US); and Timur Misirpashaev, West Windsor, NJ (US)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Dec. 20, 2022, as Appl. No. 18/085,452.
Prior Publication US 2024/0202203 A1, Jun. 20, 2024
Int. Cl. G06F 16/24 (2019.01); G06F 16/2453 (2019.01); G06F 16/2455 (2019.01); G06F 16/2457 (2019.01)
CPC G06F 16/24578 (2019.01) [G06F 16/24542 (2019.01); G06F 16/24564 (2019.01)] 21 Claims
OG exemplary drawing
 
1. A method comprising:
generating a data dictionary for each of a set of data listings in a data exchange of a cloud computing platform, the data dictionary for each of the set of data listings comprising:
first metadata describing data shared by a data listing; and
second metadata describing individual objects included in the data shared by the data listing including tables, schemas, views, and functions;
receiving a query comprising a set of search terms;
retrieving the set of data listings in the data exchange of the cloud computing platform based on the set of search terms of the query, wherein the data exchange enables data providers to publish and control access to the set of data listings via share objects;
generating, by a processing device and for each data listing of the set of data listings, a set of listing-specific signals and a set of external signals, wherein each external signal of the set of external signals corresponds to a measure of activity of each data listing of the set of data listings in the data exchange of the cloud computing platform, wherein the set of listing-specific signals comprises a distance of search terms from one another within the data listing;
ranking, by the processing device, the set of data listings based on:
the data dictionary;
a lexical diversity of the set of listing-specific signals, wherein the lexical diversity indicates a number of unique words in fields of the set of data listings divided by a total number of words in the fields, wherein the set of data listings are weighted according to the total number of words in the fields, a range of values, and a number of distinct values;
the set of external signals for each data listing of the set of data listings, comprising a number of views of the data listing;
a professional role of a user issuing the query; and
industry attributes of a company associated with the user; and
presenting, based on the ranking, the set of data listings to a data consumer along with a description of each listing of the set of data listings.