US 12,067,006 B2
	Machine learning system for digital assistants
Pranav Singh, Sunnyvale, CA (US); Yilun Zhang, Toronto (CA); Keyvan Mohajer, Los Gatos, CA (US); and Mohammadreza Fazeli, Toronto (CA)
Assigned to SoundHound AI IP, LLC., Santa Clara, CA (US)
Filed by SoundHound, Inc., Santa Clara, CA (US)
Filed on Jun. 17, 2021, as Appl. No. 17/350,294.
Claims priority of provisional application 62/705,360, filed on Jun. 23, 2020.
Prior Publication US 2021/0397610 A1, Dec. 23, 2021
Int. Cl. G06F 16/242 (2019.01); G06N 3/045 (2023.01); G06N 3/088 (2023.01)

CPC G06F 16/2425 (2019.01) [G06N 3/045 (2023.01); G06N 3/088 (2013.01)]

21 Claims

1. A method of training a machine learning system for use with a digital assistant, the method comprising:

obtaining training data comprising query data samples;

obtaining vector representations of the query data samples;

clustering the vector representations;

determining canonical queries and corresponding query groups based on the clustered vector representations, wherein corresponding query groups correspond to determined canonical queries;

performing named entity recognition on the query data samples and canonical queries;

replacing a text data for tagged named entities with a named entity type tag;

generating paired data samples based on determined canonical queries and selections from the corresponding query groups; and

training an encoder-decoder neural network architecture using the paired data samples, wherein the selections from the corresponding query groups are supplied as input sequence data and the determined canonical queries are supplied as output sequence data,

wherein the digital assistant is configured to map data representing an initial query to data representing a revised query associated with one of the canonical queries, via the encoder-decoder neural network architecture, the data representing the revised query being further processed to provide a response to the initial query,

wherein generating paired data samples comprises filtering generated paired data samples, and

wherein filtering comprises:

removing paired data samples with a canonical query whose named entity tags do not match the named entity tags in the corresponding selection from the query group.