US 12,093,646 B2
Systems and methods for semi-supervised extraction of text classification information
Zhengjie Miao, Mountain View, CA (US); Yuliang Li, Mountain View, CA (US); Xiaolan Wang, Mountain View, CA (US); and Wang-Chiew Tan, Mountain View, CA (US)
Assigned to RECRUIT CO., LTD., Tokyo (JP)
Filed by Recruit Co., Ltd., Tokyo (JP)
Filed on Jan. 15, 2021, as Appl. No. 17/151,088.
Prior Publication US 2022/0229984 A1, Jul. 21, 2022
Int. Cl. G06F 40/284 (2020.01); G06F 40/289 (2020.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06F 40/284 (2020.01) [G06F 40/289 (2020.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)] 19 Claims
OG exemplary drawing
 
1. A non-transitory computer readable storage medium storing instructions that are executable by a text classification system that includes one or more processors to cause the text classification system to perform a method for extracting classification information, the method comprising:
obtaining input text;
identifying a plurality of tokens in the input text;
pre-training a machine learning model by:
accessing an opinion phrase from a set of labeled data;
generating a first set of opinion phrases using the opinion phrase, wherein the generating comprises replacing of a span in the opinion phrase and storing the opinion phrase with the replaced span;
interpolating a second set of opinion phrases using the first set of opinion phrases;
storing the first and second set of opinion phrases; and
providing the first and second set of opinion phrases for training the machine learning model;
determining tagging information of the plurality of tokens using a first classification layer of the machine learning model;
pairing sequences of tokens using the tagging information associated with the plurality of tokens, wherein the paired sequences of tokens are determined by a second classification layer of the machine learning model;
determining one or more attribute classifiers to apply to the one or more paired sequences, wherein the attribute classifiers are determined by a third classification layer of the machine learning model;
evaluating sentiments of the paired sequences, wherein the sentiments of the paired sequences are determined by a fourth classification layer of the machine learning model;
aggregating sentiments of the paired sequences associated with an attribute classifier of the one or more attribute classifiers; and
storing the aggregated sentiments of each attribute classifier and the one or more attribute classifiers.