US 12,326,896 B2
Method for managing language data and server using the same
Myung Won Jung, Seoul (KR); Eun Hye Kwon, Busan (KR); Hoo Ram Nam, Busan (KR); and Hye Seong Park, Sejong (KR)
Assigned to AKA AI CO., LTD., Seoul (KR)
Appl. No. 18/039,384
Filed by AKA AI CO., LTD., Seoul (KR)
PCT Filed Aug. 23, 2022, PCT No. PCT/KR2022/012565
§ 371(c)(1), (2) Date May 30, 2023,
PCT Pub. No. WO2024/043355, PCT Pub. Date Feb. 29, 2024.
Prior Publication US 2024/0362263 A1, Oct. 31, 2024
Int. Cl. G06F 16/355 (2025.01); G06F 16/31 (2019.01)
CPC G06F 16/355 (2019.01) [G06F 16/322 (2019.01)] 6 Claims
OG exemplary drawing
 
1. A method for managing language data for determining similarity, the method comprising:
in a state in which the language data in a tree structure includes at least one node, and the at least one node includes at least one word,
(a) generating, by a management server, a plurality of word vectors including a first word vector and a second word vector based on the number of words included in each of a plurality of pieces of language data;
(b) using, by the management server, a dot product function of the plurality of word vectors including the first word vector and the second word vector to measure a score of similarity among a plurality of pieces of language data including first language data corresponding to the first word vector and second language data corresponding to the second word vector;
wherein, in a state in which scores of similarity between the plurality of pieces of language data have been measured, a reference value includes a first reference value, a second reference value, and a third reference value, and the magnitudes of the first reference value, the second reference value, and the third reference value sequentially increase, and the method further comprises:
(c) grouping, by the management server, word vectors of a pair of pieces of language data having a score of similarity higher than a first reference value together and then generating a plurality of first clusters on a graph;
(d) grouping, by the management server, word vectors of a pair of pieces of language data having a score of similarity higher than a second reference value together and then generating a plurality of second clusters on the graph;
(e) grouping, by the management server, word vectors of a pair of pieces of language data having a score of similarity higher than a third reference value together and then generating a plurality of third clusters on the graph; and
(f) acquiring the second reference value satisfying a condition that the number of the plurality of second clusters is greater than the number of the plurality of first clusters or the number of the plurality of third clusters.