US 12,277,385 B2
Text keyword extraction method, electronic device, and computer readable storage medium
Xiaoyuan Zhang, Shanghai (CN); Xiao Chen, Shanghai (CN); and Xiangxiang Ma, Shanghai (CN)
Assigned to AISHU TECHNOLOGY CORP., Shanghai (CN)
Appl. No. 17/906,535
Filed by Aishu Technology Corp., Shanghai (CN)
PCT Filed Aug. 18, 2020, PCT No. PCT/CN2020/109670
§ 371(c)(1), (2) Date Sep. 16, 2022,
PCT Pub. No. WO2021/184674, PCT Pub. Date Sep. 23, 2021.
Claims priority of application No. 202010188511.6 (CN), filed on Mar. 17, 2020.
Prior Publication US 2023/0136368 A1, May 4, 2023
Int. Cl. G06F 40/216 (2020.01); G06F 16/31 (2019.01); G06F 16/35 (2025.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01)
CPC G06F 40/216 (2020.01) [G06F 16/313 (2019.01); G06F 16/35 (2019.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01)] 16 Claims
OG exemplary drawing
 
1. A text keyword extraction method, comprising:
acquiring a text and extracting a body of the text;
extracting a set number of keywords in the body by a term frequency-inverse document frequency (TFIDF) algorithm and extracting a set number of keywords in the body by a TextRank algorithm, respectively;
acquiring a text name and a text title of the text and segmenting the text name and the text title into words of the text name and words of the text title;
constructing text feature vectors of the text and inputting the text feature vectors into a trained keyword extraction model; and
extracting a keyword set from the keywords extracted by the TextRank algorithm by using the keyword extraction model to achieve text keyword extraction;
wherein constructing the text feature vectors of the text and inputting the text feature vectors into the trained keyword extraction model comprises:
determining whether the keywords extracted by the TextRank algorithm appear in the words of the text name; based on a determination result that the keywords extracted by the TextRank algorithm appear in the words of the text name, using a vector [1, 0] to indicate the result that the keywords extracted by the TextRank algorithm appear in the words of the text name; and based on a determination result that the keywords extracted by the TextRank algorithm do not appear in the words of the text name, using a vector [0, 1] to indicate the result that the keywords extracted by the TextRank algorithm do not appear in the words of the text name;
determining whether the keywords extracted by the TextRank algorithm appear in the words of the text title; based on a determination result that the keywords extracted by the TextRank algorithm appear in the words of the text title, using a vector [1, 0] to indicate the result that the keywords extracted by the TextRank algorithm appear in the words of the text title; and based on a determination result that the keywords extracted by the TextRank algorithm do not appear in the words of the text title, using a vector [0, 1] to indicate the result that the keywords extracted by the TextRank algorithm do not appear in the words of the text title;
for each of the keywords extracted by the TextRank algorithm, respectively constructing a text feature vector of each of the keywords extracted by the TextRank algorithm by using a ranking of each of the keywords extracted by the TextRank algorithm among the keywords extracted by the TextRank algorithm, a ranking of the each of the keywords extracted by the TextRank algorithm among the keywords extracted by the TFIDF algorithm, a relationship between the each of the keywords extracted by the TextRank algorithm and the text name, and a relationship between the each of the keywords extracted by the TextRank algorithm and the text title; and
inputting all the text feature vectors into the trained keyword extraction model.