US 11,681,863 B2
Regulatory document analysis with natural language processing
Tyler Moser, Boyertown, PA (US); Dan Muir, Downingtown, PA (US); Manleen Sabharwal, Malvern, PA (US); Michael Graver, Eagleville, PA (US); Chandra Shaker Varma Pathapati, Wayne, PA (US); Jaqulin Maria Sebastian, Malvern, PA (US); and Seetharaman Venkiteswaran, Malvern, PA (US)
Assigned to CERNER INNOVATION, INC., Kansas City, KS (US)
Filed by CERNER INNOVATION, INC., Kansas City, KS (US)
Filed on Dec. 23, 2020, as Appl. No. 17/132,432.
Prior Publication US 2022/0198128 A1, Jun. 23, 2022
Int. Cl. G06F 40/143 (2020.01); G06F 40/284 (2020.01); G06F 16/958 (2019.01)
CPC G06F 40/143 (2020.01) [G06F 16/986 (2019.01); G06F 40/284 (2020.01)] 19 Claims
OG exemplary drawing
 
1. A computerized method comprising:
receiving, from a web server, a first document in (Hypertext Markup Language) HTML format and a second document in the HTML format, the first document being a revised version and the second document being an original version;
converting the first document from the HTML format into a first tree data structure and the second document from the HTML format into a second tree data structure;
cleaning the first and second documents by removing one or more webpage-specific elements from the first and second tree data structures;
subsequent to cleaning, converting the first tree data structure back into the HTML format for the first document and the second tree data structure back into the HTML format for the second document;
autonomously highlighting, in the first document by a document analysis engine subsequent to cleaning, content that was added to the first document as compared to the second document;
autonomously highlighting, in the second document by the document analysis engine subsequent to cleaning, content that was removed from the second document as compared to the first document; and
generating and causing display of a graphical user interface that concurrently displays the first document in the HTML format with the added content highlighted and the second document in the HTML format with removed content highlighted.