US 11,868,730 B2
Method and system for aspect-level sentiment classification by graph diffusion transformer
Xiaochen Hou, Mountain View, CA (US); Jing Huang, Mountain View, WA (US); Guangtao Wang, Cupertino, CA (US); Xiaodong He, Beijing (CN); and Bowen Zhou, Beijing (CN)
Assigned to JINGDONG DIGITS TECHNOLOGY HOLDING CO., LTD., Beijing (CN); and JD FINANCE AMERICA CORPORATION, Wilmington, DE (US)
Filed by JINGDONG DIGITS TECHNOLOGY HOLDING CO., LTD., Beijing (CN); and JD FINANCE AMERICA CORPORATION, Wilmington, DE (US)
Filed on May 24, 2021, as Appl. No. 17/327,830.
Claims priority of provisional application 63/082,105, filed on Sep. 23, 2020.
Prior Publication US 2022/0092267 A1, Mar. 24, 2022
Int. Cl. G06F 40/30 (2020.01); G06F 40/279 (2020.01); G06F 40/205 (2020.01)
CPC G06F 40/30 (2020.01) [G06F 40/205 (2020.01); G06F 40/279 (2020.01)] 19 Claims
OG exemplary drawing
 
1. A system comprising a computing device, the computing device comprising a processor and a storage device storing computer executable code, wherein the computer executable code comprises a plurality of graph diffusion transformer (GDT) layers, and wherein the computer executable code, when executed at the processor, is configured to:
receive a sentence having an aspect term and context, the aspect term having a classification label;
convert the sentence into a dependency tree graph;
calculate, by using an l-th GDT layer of the plurality of GDT layers, an attention matrix of the dependency tree graph based on one-hop attention between any two of a plurality of nodes in the dependency tree graph;
calculate graph attention diffusion from multi-hop attention between any two of the plurality of nodes in the dependency tree graph based on the attention matrix;
obtain an embedding of the dependency tree graph using the graph diffusion attention;
classify the aspect term based on the embedding of the dependency tree graph to obtain predicted classification of the aspect term;
calculate a loss function based on the predicted classification of the aspect term and the classification label of the aspect term; and
adjust parameters of models in the computer executable code based on the loss function,
wherein the l-th GDT layer of the plurality of GDT layers is configured to calculate the attention matrix by:
calculating an attention score si,j(l)2 (v*σ1(Whhi(l)∥Wthj(l))) for nodes i and node j in the dependency tree graph, wherein Wh, Wtcustom characterd×d and v∈custom character2×d are learnable weights, hi(l) is a feature of node i at the l-th GDT layer, d is a hidden dimension of hi(l), ∥ is a concatenation operation, σ1 is a ReLU activation function, and σ2 is a LeakyReLU activation function;
obtaining attention score matrix S(l) by:

OG Complex Work Unit Math
and
calculating the attention matrix A(l) by: A(l)=softmax(S(l)).