US 12,475,322 B2
Training of a text classification model from long text and based on text semantics
Itay Margolin, Petach Tikva (IL); and Yair Horesh, Kfar-Saba (IL)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Jan. 30, 2024, as Appl. No. 18/427,566.
Prior Publication US 2025/0245436 A1, Jul. 31, 2025
Int. Cl. G06F 40/30 (2020.01); G06F 40/284 (2020.01); G06N 20/00 (2019.01)
CPC G06F 40/30 (2020.01) [G06F 40/284 (2020.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for training a text classification model, the method comprising:
obtaining an electronic text, wherein the electronic text is associated with a label indicating a desired classification of the electronic text;
segmenting the electronic text into a plurality of first segments;
generating, by a text embedding model, a plurality of first embeddings from the plurality of first segments;
obtaining a plurality of second embeddings generated by the text embedding model from a plurality of second segments of text associated with one or more known classifications;
generating a plurality of associations between the plurality of first embeddings and the plurality of second embeddings, wherein each association is based on a difference between a first embedding of the plurality of first embeddings and a second embedding of the plurality of second embeddings;
training a text classification model based on the plurality of associations using supervised learning, including:
providing the plurality of first segments and the plurality of associations as an input to the text classification model;
generating, by the text classification model, a plurality of classifications from the plurality of first segments based on the plurality of associations; and
adjusting the text classification model based on the label associated with the electronic text;
obtaining a second electronic text to be classified by the trained text classification model;
providing the second electronic text to the trained text classification model; and
generating, by the trained text classification model, a classification of the second electronic text.