US 11,727,703 B2
Apparatus for detecting contextually-anomalous sentence in document, method therefor, and computer-readable recording medium having program for performing same method recorded thereon
Hyeong Jin Byeon, Seoul (KR); Min Gwan Seo, Seoul (KR); and Hae Bin Shin, Seoul (KR)
Assigned to ESTSOFT CORP., Seoul (KR)
Appl. No. 17/413,825
Filed by ESTSOFT CORP., Seoul (KR)
PCT Filed Nov. 14, 2019, PCT No. PCT/KR2019/015553
§ 371(c)(1), (2) Date Jun. 14, 2021,
PCT Pub. No. WO2020/122440, PCT Pub. Date Jun. 18, 2020.
Claims priority of application No. 10-2018-0162214 (KR), filed on Dec. 14, 2018.
Prior Publication US 2022/0027608 A1, Jan. 27, 2022
Int. Cl. G06V 30/40 (2022.01); G06F 40/289 (2020.01); G06F 40/12 (2020.01); G06V 10/70 (2022.01); G06N 3/08 (2023.01); G06F 18/22 (2023.01)
CPC G06V 30/40 (2022.01) [G06F 18/22 (2023.01); G06F 40/12 (2020.01); G06F 40/289 (2020.01); G06N 3/08 (2013.01); G06V 10/768 (2022.01)] 9 Claims
OG exemplary drawing
 
1. An apparatus for detecting a contextually-anomalous sentence in a document, the apparatus comprising a processor and one or more memory devices communicatively coupled to the processor, and the one or more memory devices store instructions operable when executed by the processor to perform:
when document data is input, encoding each of a plurality of sentences included in the document data to generate an encoding vector sequence including a plurality of encoding vectors, wherein each sentence has a context;
when the document data is learning data, performing learning by a context anomaly detector neural network by:
generating context learning data including pairs of encoding vectors corresponding to two or more sentences selected from the encoding vector sequence and a reference value indicating whether the contexts between the two or more sentences match with each other;
causing a context embedder neural network to convert the generated encoding vector sequence into a first plurality of context embedding vectors corresponding to each of the plurality of encoding vectors included in the generated encoding vector sequence to generate a first embedding vector sequence based on the context learning data;
causing a distance learning neural network to, when converting the pairs of encoding vectors included in the context learning data as pairs of embedding vectors corresponding thereto by the context embedder neural network, calculate a distance value between the pairs of embedding vectors, wherein the distance learning neural network receives the pair of embedding vectors as an input and calculates the distance value;
causing the context embedder neural network to perform learning so that a difference between the distance value between the pairs of embedding vectors calculated by the distance learning neural network and the reference value included in the context learning data falls within a first predetermined range;
causing the context anomaly detector neural network to calculate a learning result value indicating whether a learning contextually-anomalous sentence exists in the document data from the first embedding vector sequence; and
causing the context anomaly detector neural network to perform learning so that a difference between the learning result value calculated by the context anomaly detector neural network and an expected value indicating whether the learning contextually-anomalous sentence exists in the learning data falls within a second predetermined range; and
when the document data is suspected data;
causing the context embedder neural network to convert the generated encoding vector sequence into a second plurality of context embedding vectors corresponding to each of the plurality of encoding vectors included in the encoding vector sequence to generate a second embedding vector sequence;
causing the context anomaly detector neural network to calculate a result value indicating whether a contextually-anomalous sentence exists in the document data from the second embedding vector sequence; and
determining whether the contextually-anomalous sentence exists in the suspected data based on the result value calculated by the context anomaly detector neural network;
wherein the processor further:
causes the distance learning neural network to perform repeated learnings, together with the context embedder neural network, by calculating a weight until the difference between the distance value and the reference value falls within the first predetermined range; and
causes the context embedder neural network, after performing the repeated learnings, to convert the encoding vectors into the first plurality of context embedding vectors reflecting the distance value in a context embedding vector space.