| CPC G06F 40/216 (2020.01) [G06F 16/313 (2019.01); G06F 16/35 (2019.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01)] | 16 Claims |

|
1. A text keyword extraction method, comprising:
acquiring a text and extracting a body of the text;
extracting a set number of keywords in the body by a term frequency-inverse document frequency (TFIDF) algorithm and extracting a set number of keywords in the body by a TextRank algorithm, respectively;
acquiring a text name and a text title of the text and segmenting the text name and the text title into words of the text name and words of the text title;
constructing text feature vectors of the text and inputting the text feature vectors into a trained keyword extraction model; and
extracting a keyword set from the keywords extracted by the TextRank algorithm by using the keyword extraction model to achieve text keyword extraction;
wherein constructing the text feature vectors of the text and inputting the text feature vectors into the trained keyword extraction model comprises:
determining whether the keywords extracted by the TextRank algorithm appear in the words of the text name; based on a determination result that the keywords extracted by the TextRank algorithm appear in the words of the text name, using a vector [1, 0] to indicate the result that the keywords extracted by the TextRank algorithm appear in the words of the text name; and based on a determination result that the keywords extracted by the TextRank algorithm do not appear in the words of the text name, using a vector [0, 1] to indicate the result that the keywords extracted by the TextRank algorithm do not appear in the words of the text name;
determining whether the keywords extracted by the TextRank algorithm appear in the words of the text title; based on a determination result that the keywords extracted by the TextRank algorithm appear in the words of the text title, using a vector [1, 0] to indicate the result that the keywords extracted by the TextRank algorithm appear in the words of the text title; and based on a determination result that the keywords extracted by the TextRank algorithm do not appear in the words of the text title, using a vector [0, 1] to indicate the result that the keywords extracted by the TextRank algorithm do not appear in the words of the text title;
for each of the keywords extracted by the TextRank algorithm, respectively constructing a text feature vector of each of the keywords extracted by the TextRank algorithm by using a ranking of each of the keywords extracted by the TextRank algorithm among the keywords extracted by the TextRank algorithm, a ranking of the each of the keywords extracted by the TextRank algorithm among the keywords extracted by the TFIDF algorithm, a relationship between the each of the keywords extracted by the TextRank algorithm and the text name, and a relationship between the each of the keywords extracted by the TextRank algorithm and the text title; and
inputting all the text feature vectors into the trained keyword extraction model.
|