US 11,720,541 B2
Document content extraction and regression testing
Anshuman Behera, Harrison, NJ (US); and Raka Rajanigandha, Jersey City, NJ (US)
Assigned to Morgan Stanley Services Group Inc.
Filed by Morgan Stanley Services Group Inc., New York, NY (US)
Filed on Jan. 5, 2021, as Appl. No. 17/142,229.
Prior Publication US 2022/0215012 A1, Jul. 7, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 16/23 (2019.01); G06F 16/958 (2019.01)
CPC G06F 16/2365 (2019.01) [G06F 16/986 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A system for confirming file integrity of automatically generated documents, comprising:
one or more databases for document storage;
one or more processors; and
non-transitory memory storing instructions that, when executed by the one or more processors, cause the one or more processors to:
receive a document template specifying one or more sections, each section comprising a set of labels for attributes;
receive, from the one or more databases, two or more automatically generated documents in the Portable Document Format (.PDF), each document of the two or more automatically generated documents known to have been intentionally generated to include each section, to include each label from the set of labels, and to include a predetermined value for each attribute labeled, but with the labels and predetermined values for the attributes labeled being enumerated in a different order from each other document's order for labels and predetermined values;
extract the set of labels for attributes and values of each of those attributes from each of the two or more automatically generated documents;
generate a tabular report comparing the values of each attribute in the two or more automatically generated documents to visually indicate which labels, if any, have differing values despite the intentional generation of the two or more documents to have a same predetermined value for the attribute being labeled; and
generate an alert for a human user if the value for any attribute in a first document of the two or more automatically generated documents is different from the value for that attribute in a second document of the two or more automatically generated documents, indicating that the intentional generation to include each a predetermined value for each attribute labeled resulted in an error because at least one predetermined value is not present for its attribute.