CPC G06F 40/284 (2020.01) [G06F 40/289 (2020.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)] | 19 Claims |
1. A non-transitory computer readable storage medium storing instructions that are executable by a text classification system that includes one or more processors to cause the text classification system to perform a method for extracting classification information, the method comprising:
obtaining input text;
identifying a plurality of tokens in the input text;
pre-training a machine learning model by:
accessing an opinion phrase from a set of labeled data;
generating a first set of opinion phrases using the opinion phrase, wherein the generating comprises replacing of a span in the opinion phrase and storing the opinion phrase with the replaced span;
interpolating a second set of opinion phrases using the first set of opinion phrases;
storing the first and second set of opinion phrases; and
providing the first and second set of opinion phrases for training the machine learning model;
determining tagging information of the plurality of tokens using a first classification layer of the machine learning model;
pairing sequences of tokens using the tagging information associated with the plurality of tokens, wherein the paired sequences of tokens are determined by a second classification layer of the machine learning model;
determining one or more attribute classifiers to apply to the one or more paired sequences, wherein the attribute classifiers are determined by a third classification layer of the machine learning model;
evaluating sentiments of the paired sequences, wherein the sentiments of the paired sequences are determined by a fourth classification layer of the machine learning model;
aggregating sentiments of the paired sequences associated with an attribute classifier of the one or more attribute classifiers; and
storing the aggregated sentiments of each attribute classifier and the one or more attribute classifiers.
|