US 12,086,551 B2
Semantic difference characterization for documents
Robin Abraham, Redmond, WA (US); J Brandon Smock, Seattle, WA (US); Owen Stephenson Whiting, Seattle, WA (US); and Henry Hun-Li Reid Pan, Sammamish, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 23, 2021, as Appl. No. 17/356,037.
Prior Publication US 2022/0414336 A1, Dec. 29, 2022
Int. Cl. G06F 40/30 (2020.01); G06F 18/2431 (2023.01); G06F 40/205 (2020.01); G06N 20/00 (2019.01); G06V 30/418 (2022.01)
CPC G06F 40/30 (2020.01) [G06F 18/2431 (2023.01); G06F 40/205 (2020.01); G06N 20/00 (2019.01); G06V 30/418 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer implemented method of determining differences between documents, the method comprising:
parsing a first document and a second document into respective distinct instances of content;
classifying the distinct instances of content into different semantic categories including text, images, and tables;
applying category specific matching algorithms to content within each of the respective instances of content to determine a similarity score for each of the respective instances of content to match the respective instances, wherein the category specific category matching algorithms comprise machine learning models trained on labeled respective category training data;
analyzing semantic differences between the content within matching respective instances of the first document and the second document as a function of the similarity scores; and
generating a characterization of the semantic differences.