| CPC G06F 40/30 (2020.01) [G06F 40/211 (2020.01); G06F 40/284 (2020.01); G06F 40/289 (2020.01)] | 11 Claims |

|
1. A processor implemented method, comprising:
receiving, via one or more hardware processors, a plurality of input data pertaining to one or more applications, wherein the input data comprises text data and non-text data, and wherein the non-text data comprises one or more audios, one or more images, and one or more videos;
converting, via the one or more hardware processors, the non-text data into a text data based on one or more conversion techniques when the input data includes non-text data, results in forming converted text data;
combining, via the one or more hardware processors, the received text data and the converted text data to form combined text data;
processing, via the one or more hardware processors, the combined text data to obtain a processed text data with immutability regulation and punctuation memory enabled, and wherein processing of the combined text data comprising:
identifying a set of words from the combined text data;
tokenizing each of the combined text data such that the identified set of words are immutability regulated and punctuation consistency is maintained, wherein immutability regulated refers to a provision to selectively maintain or regulate phrases/words intact inferring a way of expressing a given text in varied ways considering relevance to domain, thereby minimizing efforts for manual creation of training data;
determining a plurality of context related synonyms in an inflected form for each of a plurality of tokenized text data; and
eliminating, one or more words identified as duplicates from the plurality of tokenized text data added with the plurality of context related synonyms in the inflected form;
iteratively generating, via the one or more processors, a plurality of multiple context-related utterances corresponding to each of the processed text data;
accumulating, via the one or more processors, the plurality of multiple context-related utterances that are ranked based on an index of deviation;
selecting, via the one or more processors, a set of high ranked multiple context-related utterances from the plurality of multiple context-related utterances when a number of possible multiple context-related utterance is greater than the number of required multiple context-related utterances and scalable to the non-text data with help of converters or adapters, wherein a first dictionary, an auto-updating dictionary utilizing a feedback, is dynamically updated when a coupling coefficient of n-grams of the plurality of text data exceeds a predefined threshold; and
training machine learning models with the set of high ranked multiple context-related utterances without manual intervention and by maintaining quality of the generated context-related utterances.
|