US 12,333,251 B2
Extracting triplets from text with relationship prediction matrix, entity prediction matrix, and alignment matrix
Jiandong Sun, Beijing (CN); Yabing Shi, Beijing (CN); Ye Jiang, Beijing (CN); and Chunguang Chai, Beijing (CN)
Assigned to BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed by BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed on Sep. 28, 2022, as Appl. No. 17/954,900.
Claims priority of application No. 202111300797.3 (CN), filed on Nov. 4, 2021.
Prior Publication US 2023/0133717 A1, May 4, 2023
Int. Cl. G06F 40/284 (2020.01); G06F 40/289 (2020.01); G06F 40/30 (2020.01)
CPC G06F 40/289 (2020.01) [G06F 40/30 (2020.01)] 18 Claims
OG exemplary drawing
 
1. An information extraction method, comprising:
acquiring to-be-processed text to obtain a semantic vector of each token in the to-be-processed text;
generating a relationship prediction matrix, an entity prediction matrix and an alignment matrix according to each token in the to-be-processed text and the semantic vector of each token; and
extracting a target triplet in the to-be-processed text using the relationship prediction matrix, the entity prediction matrix and the alignment matrix, and taking the target triplet as an information extraction result of the to-be-processed text,
wherein the extracting the target triplet in the to-be-processed text using the relationship prediction matrix, the entity prediction matrix and the alignment matrix comprises:
determining a subject start token and an object start token corresponding to a same relationship type according to the relationship prediction matrix;
determining an entity start token and an entity end token corresponding to a same entity type according to the entity prediction matrix;
determining an entity and an object corresponding to the same relationship type in the to-be-processed text according to the subject start token and the object start token corresponding to the same relationship type, as well as the entity start token and the entity end token corresponding to the same entity type;
combining each relationship type and the entity and the object corresponding to the relationship type to obtain at least one candidate triplet; and
selecting a triplet meeting a preset requirement from the at least one candidate triplet as the target triplet according to the alignment matrix.