| CPC G06F 16/3347 (2019.01) [G06F 16/3334 (2019.01); G06F 16/3344 (2019.01); G06F 16/3346 (2019.01)] | 8 Claims |

|
1. A method comprising:
receiving, by a user device, an utterance as an audio file;
receiving, by an orchestration layer of a parameter generation platform, the utterance;
converting, by the orchestration layer, audio file of the utterance into a text string;
tokenizing, by a tokenization engine of the parameter generation platform, the utterance into a plurality of tokens, each token of the plurality of tokens comprising a portion of the utterance separated by a space, wherein tokenizing includes identifying one or more tokens of the plurality of tokens to be operators and symbols within the utterance;
transforming, by a featurizer engine of the parameter generation platform, the plurality of tokens into a plurality of feature vectors, the featurizer engine comprising a sparse featurizer and a dense featurizer, the sparse featurizer providing a count of frequent individual words that are filtered based on individual words occurring in a plurality of received utterances, the dense featurizer providing a semantic meaning in context by converting one or more word strings of the utterance into a real valued feature vector, the plurality of feature vectors comprising the count of frequent individual words and the real valued feature vector;
determining, by a machine learning model of the parameter generation platform, an intent classification of the utterance based on the plurality of feature vectors;
classifying, by the machine learning model, the text string into query parameters;
determining, by the machine learning model, a database of a plurality of databases to query based on the plurality of feature vectors;
assigning, by the machine learning model, an entity label to each of the plurality of feature vectors;
resolving, by a disambiguation engine of the parameter generation platform, each feature vector of the plurality of feature vectors to a corresponding standardized value of a database query language, wherein resolving the each feature vector of the plurality of feature vectors to a corresponding standardized value of a database query language includes processing the each feature vector of the plurality of feature vectors with a string-based algorithm computing a Levenshtein distance;
scoring, by the disambiguation engine, the each feature vector of the plurality of feature vectors with respect to a plurality of candidates from a reference data table;
mapping, by the disambiguation engine, the each feature vector of the plurality of feature vectors to a key value, wherein the key value corresponds to a candidate of the plurality of candidates with a highest score, the highest score calculated based on the Levenshtein distance;
determining, by the disambiguation engine, a conflict between the corresponding standardized value and the entity label; and
overriding, by the disambiguation engine, the entity label based on the determination.
|