| CPC G06F 40/166 (2020.01) [G06F 40/103 (2020.01); G06F 40/126 (2020.01); G06N 20/00 (2019.01); G06V 30/19127 (2022.01); G06V 30/416 (2022.01)] | 12 Claims |

|
1. An information processing device comprising:
a memory storing instructions; and
one or more processors configured to execute the instructions to:
acquire a structured document including a heading and text;
generate, for the heading included in the structured document, a matrix of frequency of occurrence of words appearing in documents of subordinate elements of the heading;
extract, for the heading, feature words by reducing dimensions of the words appearing in the documents using principal component analysis (PCA);
generate a new heading based on the extracted feature words; and
generate a corrected structured document by replacing the heading with the new heading.
|