CPC G06F 40/30 (2020.01) [G06F 16/90332 (2019.01); G06N 3/04 (2013.01); G06V 30/153 (2022.01)] | 12 Claims |
1. A computer program product in a non-transitory computer-readable medium for use in a data processing system for information and retrieval, the computer program product holding computer program instructions that, when executed by the data processing system, are configured to:
receive a corpus of documents associated with a user, wherein the documents are structured in two or more distinct formats;
for each document in the corpus, process the document to identify a set of information strings and, for each information string, encode at least a portion of the information string into an n-dimensional semantic vector;
store the n-dimensional semantic vectors for each document;
upon receipt of a query, process the query into an n-dimensional semantic query vector;
compare the n-dimensional semantic query vector against the stored n-dimensional vectors for each document and, in response, identifying a set of candidate n-dimensional vectors that represent a possible answer to the query, wherein identifying the set of candidate n-dimensional vectors applies a neural filter that has been trained against a dataset of question-answer data structured as groupings of candidate sentences for an example query, wherein for a given training example the neural filter is trained to identify a particular candidate sentence that includes an answer to the example query while remaining candidate sentences that do not include the answer are characterized by the neural filter as contrasting;
rank the candidate n-dimensional vectors; and
return as an answer to the query a data string represented by a given highest ranked candidate n-dimensional vector.
|