US 12,217,003 B2
	Apparatus and method for processing natural language
Bong-Kyu Hwang, Seoul (KR); Ju-Dong Kim, Seoul (KR); Jae-Woong Yun, Seoul (KR); Hyun-Jae Lee, Seoul (KR); Hyun-Jin Choi, Seoul (KR); Seong-Ho Joe, Seoul (KR); and Young-June Gwon, Seoul (KR)
Assigned to SAMSUNG SDS CO., LTD., Seoul (KR)
Filed by SAMSUNG SDS CO., LTD., Seoul (KR)
Filed on Jul. 13, 2022, as Appl. No. 17/863,742.
Claims priority of application No. 10-2021-0119688 (KR), filed on Sep. 8, 2021.
Prior Publication US 2023/0081042 A1, Mar. 16, 2023
Int. Cl. G06F 40/205 (2020.01); G06F 16/35 (2019.01); G06F 40/117 (2020.01); G06F 40/221 (2020.01); G06F 40/284 (2020.01); G06F 40/279 (2020.01); G06F 40/295 (2020.01); G10L 15/18 (2013.01); G10L 15/183 (2013.01)

CPC G06F 40/284 (2020.01) [G06F 16/35 (2019.01); G06F 40/117 (2020.01); G06F 40/205 (2020.01); G06F 40/221 (2020.01); G06F 40/279 (2020.01); G06F 40/295 (2020.01); G10L 15/18 (2013.01); G10L 15/1815 (2013.01); G10L 15/183 (2013.01)]

12 Claims

7. A method for processing natural language performed by a computing device that includes one or more processors and a memory for storing one or more programs executed by the one or more processors, the method comprising:

collecting documents having tags;

extracting text from the collected documents and extracting tag-related information on a tag surrounding each extracted text;

generating tokens of a preset unit by tokenizing each extracted text;

generating token position information for each token in full text of a document; and

setting the token and the token position information as training data by matching in matching with the tag-related information,

wherein the tag-related information includes structural position information of a tag in which each text is positioned,

wherein the structural position information of the tag includes depth information and relative position information of a corresponding tag,

wherein the depth information is information indicating a depth level of the corresponding tag,

wherein the relative position information includes a relative position information value that is assigned to tags having the same depth level to sequentially increase or decrease according to a relative position or order between the tags having the same depth level, and

wherein relative position information values of two adjacent tags having different depth levels do not represent a relative position or order between the two adjacent tags.