| CPC G06F 16/355 (2019.01) [G06F 16/322 (2019.01)] | 6 Claims |

|
1. A method for managing language data for determining similarity, the method comprising:
in a state in which the language data in a tree structure includes at least one node, and the at least one node includes at least one word,
(a) generating, by a management server, a plurality of word vectors including a first word vector and a second word vector based on the number of words included in each of a plurality of pieces of language data;
(b) using, by the management server, a dot product function of the plurality of word vectors including the first word vector and the second word vector to measure a score of similarity among a plurality of pieces of language data including first language data corresponding to the first word vector and second language data corresponding to the second word vector;
wherein, in a state in which scores of similarity between the plurality of pieces of language data have been measured, a reference value includes a first reference value, a second reference value, and a third reference value, and the magnitudes of the first reference value, the second reference value, and the third reference value sequentially increase, and the method further comprises:
(c) grouping, by the management server, word vectors of a pair of pieces of language data having a score of similarity higher than a first reference value together and then generating a plurality of first clusters on a graph;
(d) grouping, by the management server, word vectors of a pair of pieces of language data having a score of similarity higher than a second reference value together and then generating a plurality of second clusters on the graph;
(e) grouping, by the management server, word vectors of a pair of pieces of language data having a score of similarity higher than a third reference value together and then generating a plurality of third clusters on the graph; and
(f) acquiring the second reference value satisfying a condition that the number of the plurality of second clusters is greater than the number of the plurality of first clusters or the number of the plurality of third clusters.
|