US 12,118,294 B2
Machine learning systems and methods for automatically tagging documents to enable accessibility to impaired individuals
David Comeau, Ontario (CA); Jeffrey Williams, Markham (CA); Evgeny Kolesnikov, Stouffville (CA); Michael Itkin, North York (CA); June Qiang, Markham (CA); James Relunia, Concord (CA); and Brian Sue, Stouffville (CA)
Assigned to OPEN TEXT CORPORATION, Waterloo (CA)
Filed by Open Text Corporation, Waterloo (CA)
Filed on May 1, 2023, as Appl. No. 18/309,857.
Application 18/309,857 is a continuation of application No. 17/174,686, filed on Feb. 12, 2021, granted, now 11,675,970.
Claims priority of provisional application 62/976,808, filed on Feb. 14, 2020.
Prior Publication US 2023/0315974 A1, Oct. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/16 (2020.01); G06F 40/154 (2020.01); G06N 20/00 (2019.01); G06V 30/413 (2022.01)
CPC G06F 40/16 (2020.01) [G06F 40/154 (2020.01); G06N 20/00 (2019.01); G06V 30/413 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
obtaining a set of tagged pdf documents, each having a corresponding a document object model (DOM) structure;
determining, for each tagged pdf document, relationships between graphical objects in the tagged pdf document and corresponding elements of the DOM structure of the tagged pdf document;
generating, for each tagged pdf document, a corresponding training record identifying the determined relationships;
training, using the training records, a machine learning model to determine DOM structure elements that are associated with graphical objects;
obtaining a set of untagged PDF documents which do not contain corresponding DOM structures; and
for each untagged PDF document, automatically generating, using the trained machine learning model, a corresponding tagged PDF document having one or more DOM structure elements corresponding to one or more graphical objects contained in the untagged PDF document.