US 11,868,730 B2
	Method and system for aspect-level sentiment classification by graph diffusion transformer
Xiaochen Hou, Mountain View, CA (US); Jing Huang, Mountain View, WA (US); Guangtao Wang, Cupertino, CA (US); Xiaodong He, Beijing (CN); and Bowen Zhou, Beijing (CN)
Assigned to JINGDONG DIGITS TECHNOLOGY HOLDING CO., LTD., Beijing (CN); and JD FINANCE AMERICA CORPORATION, Wilmington, DE (US)
Filed by JINGDONG DIGITS TECHNOLOGY HOLDING CO., LTD., Beijing (CN); and JD FINANCE AMERICA CORPORATION, Wilmington, DE (US)
Filed on May 24, 2021, as Appl. No. 17/327,830.
Claims priority of provisional application 63/082,105, filed on Sep. 23, 2020.
Prior Publication US 2022/0092267 A1, Mar. 24, 2022
Int. Cl. G06F 40/30 (2020.01); G06F 40/279 (2020.01); G06F 40/205 (2020.01)

CPC G06F 40/30 (2020.01) [G06F 40/205 (2020.01); G06F 40/279 (2020.01)]

19 Claims

1. A system comprising a computing device, the computing device comprising a processor and a storage device storing computer executable code, wherein the computer executable code comprises a plurality of graph diffusion transformer (GDT) layers, and wherein the computer executable code, when executed at the processor, is configured to:

receive a sentence having an aspect term and context, the aspect term having a classification label;

convert the sentence into a dependency tree graph;

calculate, by using an l-th GDT layer of the plurality of GDT layers, an attention matrix of the dependency tree graph based on one-hop attention between any two of a plurality of nodes in the dependency tree graph;

calculate graph attention diffusion from multi-hop attention between any two of the plurality of nodes in the dependency tree graph based on the attention matrix;

obtain an embedding of the dependency tree graph using the graph diffusion attention;

classify the aspect term based on the embedding of the dependency tree graph to obtain predicted classification of the aspect term;

calculate a loss function based on the predicted classification of the aspect term and the classification label of the aspect term; and

adjust parameters of models in the computer executable code based on the loss function,

wherein the l-th GDT layer of the plurality of GDT layers is configured to calculate the attention matrix by:

calculating an attention score s_i,j^(l)=σ₂(v*σ₁(W_hh_i^(l)∥W_th_j^(l))) for nodes i and node j in the dependency tree graph, wherein W_h, W_t∈ custom character

^d×dand v∈ custom character

^2×dare learnable weights, h_i^(l)is a feature of node i at the l-th GDT layer, d is a hidden dimension of h_i^(l), ∥ is a concatenation operation, σ₁is a ReLU activation function, and σ₂is a LeakyReLU activation function;

obtaining attention score matrix S^(l)by:

and

calculating the attention matrix A^(l)by: A^(l)=softmax(S^(l)).