| CPC G06F 40/284 (2020.01) [G06F 16/35 (2019.01); G06F 40/117 (2020.01); G06F 40/205 (2020.01); G06F 40/221 (2020.01); G06F 40/279 (2020.01); G06F 40/295 (2020.01); G10L 15/18 (2013.01); G10L 15/1815 (2013.01); G10L 15/183 (2013.01)] | 12 Claims | 

| 
               7. A method for processing natural language performed by a computing device that includes one or more processors and a memory for storing one or more programs executed by the one or more processors, the method comprising: 
            collecting documents having tags; 
                extracting text from the collected documents and extracting tag-related information on a tag surrounding each extracted text; 
                generating tokens of a preset unit by tokenizing each extracted text; 
                generating token position information for each token in full text of a document; and 
                setting the token and the token position information as training data by matching in matching with the tag-related information, 
                wherein the tag-related information includes structural position information of a tag in which each text is positioned, 
                wherein the structural position information of the tag includes depth information and relative position information of a corresponding tag, 
                wherein the depth information is information indicating a depth level of the corresponding tag, 
                wherein the relative position information includes a relative position information value that is assigned to tags having the same depth level to sequentially increase or decrease according to a relative position or order between the tags having the same depth level, and 
                wherein relative position information values of two adjacent tags having different depth levels do not represent a relative position or order between the two adjacent tags. 
               |