US 11,947,892 B1
Plain-text analysis of complex document formats
Isaac T. Slutsky, Bloomfield Township, MI (US)
Assigned to CLAIMABLY LLC, Bloomfield Township, MI (US)
Filed by CLAIMABLY LLC, Bloomfield Township, MI (US)
Filed on Mar. 15, 2022, as Appl. No. 17/695,298.
Claims priority of provisional application 63/161,182, filed on Mar. 15, 2021.
Int. Cl. G06F 40/103 (2020.01); G06F 9/48 (2006.01); G06F 40/143 (2020.01); G06F 40/197 (2020.01); G06F 40/279 (2020.01)
CPC G06F 40/103 (2020.01) [G06F 9/4881 (2013.01); G06F 40/143 (2020.01); G06F 40/197 (2020.01); G06F 40/279 (2020.01)] 19 Claims
OG exemplary drawing
 
1. A system for document analysis, comprising:
a memory configured to store a first document in a first format, the first document having an ordered sequence of data elements, the data elements in the first format including characters and embedded objects, each data element being accessible by index location into the ordered sequence; and
a processor programmed to:
convert the first document in the first format into a second document in a second format, the second document having a second ordered sequence including a subset of the data elements of the first document, the subset including the characters but not the embedded objects, such that the index location into the ordered sequence of the characters differs between the first document and the second document,
analyze the second document in the second format as plain text,
identify a string of the data elements of interest in the second document,
represent a document location of the string of the data elements of interest as a relative location into the second document, the relative location being specified by occurrence number of the string from an offset location in the second document, such that the relative location is mappable to the index location of the string in both the first document and the second document, and
map the relative location into the second document into the index location of the first document by finding the occurrence number of the string from the offset location in the first document.