US 12,216,987 B2
Generating heading based on extracted feature words
Ayako Hoshino, Tokyo (JP)
Assigned to NEC CORPORATION, Tokyo (JP)
Appl. No. 18/037,889
Filed by NEC Corporation, Tokyo (JP)
PCT Filed Nov. 25, 2020, PCT No. PCT/JP2020/043812
§ 371(c)(1), (2) Date May 19, 2023,
PCT Pub. No. WO2022/113202, PCT Pub. Date Jun. 2, 2022.
Prior Publication US 2023/0409808 A1, Dec. 21, 2023
Int. Cl. G06N 20/00 (2019.01); G06F 40/103 (2020.01); G06F 40/126 (2020.01); G06F 40/166 (2020.01); G06V 30/19 (2022.01); G06V 30/416 (2022.01)
CPC G06F 40/166 (2020.01) [G06F 40/103 (2020.01); G06F 40/126 (2020.01); G06N 20/00 (2019.01); G06V 30/19127 (2022.01); G06V 30/416 (2022.01)] 12 Claims
OG exemplary drawing
 
1. An information processing device comprising:
a memory storing instructions; and
one or more processors configured to execute the instructions to:
acquire a structured document including a heading and text;
generate, for the heading included in the structured document, a matrix of frequency of occurrence of words appearing in documents of subordinate elements of the heading;
extract, for the heading, feature words by reducing dimensions of the words appearing in the documents using principal component analysis (PCA);
generate a new heading based on the extracted feature words; and
generate a corrected structured document by replacing the heading with the new heading.