US 12,380,717 B2
	Selecting files for intensive text extraction
Tohru Hasegawa, Tokyo (JP); Takuya Goto, Kodaira (JP); and Shunsuke Ishikawa, Shinjuku-ku (JP)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Mar. 14, 2022, as Appl. No. 17/694,003.
Prior Publication US 2023/0290168 A1, Sep. 14, 2023
Int. Cl. G06V 30/19 (2022.01); G06F 16/3331 (2025.01); G06V 30/418 (2022.01)

CPC G06V 30/19093 (2022.01) [G06F 16/3331 (2019.01); G06V 30/418 (2022.01)]

20 Claims

1. A method comprising:

identifying a feature of a subject file;

comparing the feature to a historical feature of a historical file;

calculating, based on the comparing, a similarity between the subject file and the historical file;

identifying a historical success metric for the historical file, wherein the historical success metric is an average historical success metric, and wherein the identifying comprises:

identifying a first historical success metric for the historical file;

identifying a second historical success metric for a second historical file; and

calculating an average of the first historical success metric and the second historical success metric, resulting in the average historical success metric;

calculating, based on the similarity and the historical success metric, an intensive text extraction success value for the subject file; and

determining, based on the intensive extraction success value, that more than one intensive text extraction method should be performed on the subject file.