US 12,450,271 B2
	Computerized systems and methods for semantic searching
Joseph Cainey, London (GB); Jyoti Mishra, London (GB); Jakob Nielsen, Copenhagen (DK); Bjorn Arnholtz, Soborg (DK); and Gary Engelbert, London (GB)
Assigned to WORKDAY, INC., Pleasanton, CA (US)
Filed by WORKDAY, INC., Pleasanton, CA (US)
Filed on Mar. 29, 2022, as Appl. No. 17/707,335.
Prior Publication US 2023/0315768 A1, Oct. 5, 2023
Int. Cl. G06F 16/334 (2025.01); G06F 16/36 (2019.01)

CPC G06F 16/334 (2019.01) [G06F 16/36 (2019.01)]

17 Claims

1. A method comprising:

receiving, by a device, a search request comprising a character string;

converting, by the device, the search request into a word embedding as an n-dimensional search vector using a machine learning model;

comparing, by the device, the word embedding against a plurality of word embeddings;

determining, by the device, a similarity measure between the word embedding and each of the plurality of word embeddings stored in a database, wherein the plurality of word embeddings correspond to survey data comprising user responses to survey questions and user-provided comments relating to the survey questions, wherein the survey data is collected from respondents and includes structured feedback ratings and unstructured textual comments;

determining a similarity threshold by generating a set of sub-bands, each sub-band associated with different minimum and maximum similarity measures, selecting random word embeddings from each sub-band, receiving a user selection of at least one of the random word embeddings to use as a search result, and setting the similarity threshold based on a similarity measure of the user-selected random word embedding;

comparing, by the device, each determined similarity measure to the similarity threshold;

identifying, by the device, a set of word embeddings, wherein each word embedding in the set of word embeddings has a similarity measure at least satisfying the similarity threshold;

identifying, by the device, a set of terms comprising survey data terms that corresponds to the set of word embeddings by reverse transforming the word embeddings to retrieve the original survey comments and responses;

outputting, by the device, for display within a graphical user interface having interactive filtering tools, the survey data terms in ranked order with corresponding relevance scores, wherein the graphical user interface enables further filtering of results, the set of terms;

storing the word embedding and a search result relevance label in a database; and

re-training the machine learning model using the stored word embedding and the search result relevance label.