CPC G06F 40/30 (2020.01) [G06F 18/2431 (2023.01); G06F 40/205 (2020.01); G06N 20/00 (2019.01); G06V 30/418 (2022.01)] | 20 Claims |
1. A computer implemented method of determining differences between documents, the method comprising:
parsing a first document and a second document into respective distinct instances of content;
classifying the distinct instances of content into different semantic categories including text, images, and tables;
applying category specific matching algorithms to content within each of the respective instances of content to determine a similarity score for each of the respective instances of content to match the respective instances, wherein the category specific category matching algorithms comprise machine learning models trained on labeled respective category training data;
analyzing semantic differences between the content within matching respective instances of the first document and the second document as a function of the similarity scores; and
generating a characterization of the semantic differences.
|