US 12,469,321 B2
	Method implemented in computer system for analyzing document versions to identify shared document elements using machine learning, and non-transitory computer-readable storage medium
Ying Li, Shanghai (CN); Liu Yao He, Beijing (CN); Di Hu, Shanghai (CN); and Xiao Feng Ji, Shanghai (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Feb. 16, 2023, as Appl. No. 18/170,053.
Prior Publication US 2024/0282136 A1, Aug. 22, 2024
Int. Cl. G06V 30/416 (2022.01); G06F 16/31 (2019.01); G06T 3/4046 (2024.01)

CPC G06V 30/416 (2022.01) [G06F 16/31 (2019.01); G06T 3/4046 (2013.01); G06T 2207/20212 (2013.01); G06T 2207/30144 (2013.01); H04N 2201/3226 (2013.01)]

20 Claims

1. A computer-implemented method of analyzing documents comprising:

receiving a first document comprising a plurality of sentences that each include one or more words;

populating a matrix with the plurality of sentences, wherein each of the one or more words of each sentence in the matrix is encoded as a numerical value;

processing the matrix using a machine learning model to generate a first feature map;

comparing the first feature map to a second feature map of a second document to identify a shared document element between the first document and the second document based on a common feature in the first feature map and the second feature map, wherein comparing the first feature map to the second feature map comprises comparing a first vector of the first feature map and a second vector of the second feature map to a codebook comprising an embedding space of codebook vectors to determine that the first vector and the second vector are both closest to a same codebook vector; and

indicating the shared document element via a user interface.