US 12,386,876 B2
Text-based document classification method and document classification device
Sooah Cho, Seoul (KR); Youngjune Gwon, Seoul (KR); and Seongho Joe, Seoul (KR)
Assigned to SAMSUNG SDS CO., LTD., Seoul (KR)
Filed by SAMSUNG SDS CO., LTD., Seoul (KR)
Filed on Oct. 27, 2022, as Appl. No. 17/975,155.
Claims priority of application No. 10-2021-0147324 (KR), filed on Oct. 29, 2021.
Prior Publication US 2023/0134169 A1, May 4, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 16/3332 (2025.01); G06F 16/35 (2019.01); G06V 30/416 (2022.01)
CPC G06F 16/35 (2019.01) [G06F 16/3334 (2019.01); G06V 30/416 (2022.01)] 14 Claims
OG exemplary drawing
 
1. A text-based document classification method performed by a processor inside a computing device, the text-based document classification method comprising:
extracting, from a document image that has been input, words included in the document image by using OCR;
generating, based on a degree of similarity between the words, a word set comprising a configured number of words, wherein the degree of similarity is calculated by embedding the words using word embedding and determining the degree of similarity between the embedded words;
generating a word set image by individually turning the word set into an image;
extracting an important keyword used for document classification among words included in the word set image by inputting the word set image into an image captioning model; and
classifying a type of the document image from the important keyword using a document classification model,
wherein, in the extracting of the important keyword, a heatmap indicating an area focused on by the image captioning model in the word set image is extracted and the important keyword is configured by using the heatmap, and
wherein the image captioning model is learned such that, when a word set image generated from a learning image is input thereto, a text describing the word set image generated from the learning image is generated, the text and a correct answer sheet regarding a document type of the learning image are compared, thereby generating an error, and the error is minimized.