CPC G06F 16/313 (2019.01) [G06F 16/9538 (2019.01); G06F 40/279 (2020.01)] | 22 Claims |
1. An information processing system comprising:
extraction means for extracting at least one of
a group of first words of interest in accordance with a first evaluation score for each of morphemes that are contained in character data posted on each of web pages included in a result of a search that has been conducted by a search engine by using a search query,
a group of second words of interest in accordance with a second evaluation score for each of the morphemes, and
a group of third words of interest in accordance with a third evaluation score for each of the morphemes; and
output means for outputting data for displaying at least the group of the words of interest that has been extracted, wherein
the first evaluation score is a score that has been set to extract a morpheme for which a number of appearances contained in higher-rank web pages is larger, and the number of appearances contained in lower-rank web pages is smaller, the higher-rank web pages appearing at higher ranks in the result of the search, the lower-rank web pages appearing at lower ranks in the result of the search, the first evaluation score of the i-th morphemes “mi” includes S1-1(mi) calculated based on a following equation (1),
S1-1(mi)=n({cij|cij=0,M≤j<N+M})−n({cij|cij=0,1≤j≤N}) (1)
wherein “cij” is a number of the morphemes “mi” contained in a web page that ranks j-th,
{x|C(c)} is a set of elements x that satisfy condition C(x), and
n(A) indicates a number of elements of set A,
the second evaluation score is a score that has been set to extract a morpheme used less frequently on the higher-rank web pages, but having a higher peculiarity relating to the search query, the second evaluation score of the i-th morphemes “mi” includes whether the peculiarity of the morphemes “mi” exceeds a threshold and S2-1 (mi) calculated based on a following equation (2),
S2-1(mi)=n({cij|cij=0,1≤j≤N}) (2)
wherein {x|C(c)} is a set of elements x that satisfy condition C(x), and
n(A) indicates a number of elements of set A, and
the third evaluation score is a score that has been set to extract a morpheme appearing less frequently on web pages having themes relating to the search query, and appearing more frequently on web pages having themes other than the themes, the third evaluation score of the i-th morphemes “mi” includes S3(mi) calculated based on a following equation (3),
S3(mi)=(gi−Cg)/(si−Cs) (3)
wherein “gi” is a degree of generality, “si” the peculiarity, and “Cg” and “Cs” are constants,
wherein the group of third words of interest corresponds to the third evaluation score and includes at least one morpheme appearing less frequently on web pages having themes relating to the search query, and appearing more frequently on web pages having themes other than the themes.
|