US 12,153,887 B2
	Deep learning-based method for filtering out similar text, and apparatus using same
Se Yeob Kim, Seoul (KR)
Assigned to SELECT STAR, INC., Daejeon (KR)
Appl. No. 17/771,221
Filed by SELECT STAR, INC., Daejeon (KR)
PCT Filed Oct. 20, 2020, PCT No. PCT/KR2020/014337 § 371(c)(1), (2) Date Apr. 22, 2022, PCT Pub. No. WO2021/118040, PCT Pub. Date Jun. 17, 2021.
Claims priority of application No. 10-2019-0164009 (KR), filed on Dec. 10, 2019.
Prior Publication US 2022/0374601 A1, Nov. 24, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/289 (2020.01); G06F 16/35 (2019.01); G06F 40/216 (2020.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)

CPC G06F 40/289 (2020.01) [G06F 16/355 (2019.01); G06F 40/216 (2020.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)]

7 Claims

1. A method for collecting filtered text data, the method comprising:

(a) acquiring, by a computing apparatus, first text data, and recording the first text data in a text data pool;

(b) acquiring, by the computing apparatus, second text data;

(c) performing, by the computing apparatus, an operation in a deep learning model by using the first text data and the second text data as input values, and calculating a first feature vector corresponding to the first text data and a second feature vector corresponding to the second text data; and

(d) comparing, by the computing apparatus, a degree of similarity between the first feature vector and the second feature vector, and recording the second text data in the text data pool when the degree of similarity is less than a predetermined value,

wherein, when a plurality of pieces of the first text data are recorded in the text data pool and the pieces of the first text data include first-first text data and first-second text data, the computing apparatus is configured to:

calculate a first-first feature vector corresponding to the first-first text data and a first-second feature vector corresponding to the first-second text data through an operation in the deep learning model;

calculate a first degree of similarity between the first-first feature vector and the second feature vector, and a second degree of similarity between the first-second feature vector and the second feature vector;

sort the first-first text data and the first-second text data based on a plurality of similarity degrees;

transmit specific text data among the sorted text data, the specific text data having a degree of similarity that is greater than or equal to the predetermined value, and the second text data to a user terminal to compare the specific text data with the second text data; and

receive, from the user terminal, whether to record the second text data in the text data pool.