US 12,292,913 B2
Automatic industry classification method and system
Kai Cao, Beijing (CN); Weining Li, Beijing (CN); and Minyue Zhang, Beijing (CN)
Assigned to BEIJING BENYING TECHNOLOGIES CO., LTD., Beijing (CN)
Appl. No. 17/788,303
Filed by BEIJING BENYING TECHNOLOGIES CO., LTD., Beijing (CN)
PCT Filed Jan. 19, 2020, PCT No. PCT/CN2020/073042
§ 371(c)(1), (2) Date Jun. 23, 2022,
PCT Pub. No. WO2021/128521, PCT Pub. Date Jul. 1, 2021.
Claims priority of application No. 201911358987.3 (CN), filed on Dec. 25, 2019.
Prior Publication US 2022/0374462 A1, Nov. 24, 2022
Int. Cl. G06F 16/353 (2025.01); G06F 18/243 (2023.01)
CPC G06F 16/353 (2019.01) [G06F 18/24323 (2023.01)] 11 Claims
OG exemplary drawing
 
1. An automatic industry classification method, comprising determining a scope of target patents, wherein the automatic industry classification method further comprises following steps:
step 1: defining a target industry tree;
step 2: generating marks on the target industry tree;
step 3: performing a rough classification for the target patents by using the marks; and
step 4: performing a fine classification for the target patents according to a result of the rough classification,
wherein step 1 further comprises:
defining an industry tree I={[i1, . . . , ij, . . . , in}, wherein ij ∈ I and is a first level industry, j is a serial number of the first level industry, 1≤j≤n, and n is a number of all leaf nodes of I; and
setting ijkl . . .={ijkl, . . . , ijkl . . . t} as any non-leaf node of I, wherein degree of other nodes except the leaf nodes is greater than or equal to 2, k is a serial number of a second level industry, l is a serial number of a third level industry, and t is a serial number of a penultimate level industry:
wherein the determining of the scope of the target patents is to manually determine the scope of the target patents to be classified,
wherein step 3 comprises determining nodes above the leaf nodes; and wherein step 3 further comprises following sub-steps:
step 31: generating a node set V of a graph:
step 32: arranging the marks;
step 33: generating an edge set E of the graph;
step 34: generating an adjacency matrix; and
step 35: performing node division,
wherein step 35 further comprises following sub-steps:
step 351: generating a degree matrix D=diag (d1, d2, . . . ,dl+u), having a diagonal element dij=1l+uWij, wherein, u is a number of unmarked nodes, and Wij, is an adjacent matrix;
step 352: generating a marked matrix, and a nonnegative (l+u)×|γ| marked matrix F= (F1T, F2T, . . . , Fl+uT)T, wherein an element of an i-th row Fi=(Fi1, Ei2, . . . , Fi|γ|) is a marked vector of an International Patent Classification (IPC) in the node set, a classification rule is γi=ar gmax1≤j≤|γ|Fij, wherein, γ is a set of industries, and T represents a transposition a matrix:
step 353: initializing the nonnegative marked matrix F, for i=1,2, . . . , m and j=1,2, . . . |γ[

OG Complex Work Unit Math
step 354: constructing a propagation matrix

OG Complex Work Unit Math
wherein,

OG Complex Work Unit Math
d represents diagonal elements of th degree matric D;
step 355: generating an iterative calculation formula F(t+1)=α*B*F(t)+ (1−α) Y—wherein, α ∈(0,1) is a parameter,F(t) is a result of a t-th iteration, and Y is an initial matrix;
step 356: iterating the iterative calculation formula to convergence to obtain a state F*=lim/t→∞F(t)=(1−α) (M-αB)−1Y under convergence, wherein, M is a unit matrix; and
step 357: performing a prediction of the unmarked nodes γi=argmax1≤j≤|γ|Fij*, wherein, l+1≤i≤l+u.