CPC G06F 16/2425 (2019.01) [G06N 3/045 (2023.01); G06N 3/088 (2013.01)] | 21 Claims |
1. A method of training a machine learning system for use with a digital assistant, the method comprising:
obtaining training data comprising query data samples;
obtaining vector representations of the query data samples;
clustering the vector representations;
determining canonical queries and corresponding query groups based on the clustered vector representations, wherein corresponding query groups correspond to determined canonical queries;
performing named entity recognition on the query data samples and canonical queries;
replacing a text data for tagged named entities with a named entity type tag;
generating paired data samples based on determined canonical queries and selections from the corresponding query groups; and
training an encoder-decoder neural network architecture using the paired data samples, wherein the selections from the corresponding query groups are supplied as input sequence data and the determined canonical queries are supplied as output sequence data,
wherein the digital assistant is configured to map data representing an initial query to data representing a revised query associated with one of the canonical queries, via the encoder-decoder neural network architecture, the data representing the revised query being further processed to provide a response to the initial query,
wherein generating paired data samples comprises filtering generated paired data samples, and
wherein filtering comprises:
removing paired data samples with a canonical query whose named entity tags do not match the named entity tags in the corresponding selection from the query group.
|