| CPC G06F 40/117 (2020.01) [G06F 40/143 (2020.01)] | 17 Claims |

|
1. A method of transforming legacy documents to extensible markup language (XML) formats, the method comprising:
extracting, by a document processing device, a plurality of attributes and a nature of at least one content from an intermediate structure file of a legacy document;
transforming, by the document processing device, the intermediate structure file into a custom object model (COM) structure, wherein transforming comprises:
classifying the intermediate structure file into one or more objects, wherein the classifying comprises:
tagging each of the at least one content in the intermediate structure, as one or more objects, based on the respective attribute and the respective nature of each of the at least one content,
wherein the nature provides information on precautionary information and wherein the nature is identified using artificial intelligence;
logically grouping each of the one or more object; and
creating a hierarchical object tree from each of the one or more objects logically grouped in accordance with the COM structure;
wherein the one or more objects comprises: a section within the at least one content of the intermediate structure file, list items within the section of the at least one content, paragraphs within each of the list items, notes, warning or caution contents, and graphic contents within the at least one content; and
converting, by the document processing device, the COM structure of the legacy document to an XML format in compliance with one or more industry standards using an XML serialization technique.
|