| CPC G06V 30/416 (2022.01) [G06F 16/31 (2019.01); G06T 3/4046 (2013.01); G06T 2207/20212 (2013.01); G06T 2207/30144 (2013.01); H04N 2201/3226 (2013.01)] | 20 Claims |

|
1. A computer-implemented method of analyzing documents comprising:
receiving a first document comprising a plurality of sentences that each include one or more words;
populating a matrix with the plurality of sentences, wherein each of the one or more words of each sentence in the matrix is encoded as a numerical value;
processing the matrix using a machine learning model to generate a first feature map;
comparing the first feature map to a second feature map of a second document to identify a shared document element between the first document and the second document based on a common feature in the first feature map and the second feature map, wherein comparing the first feature map to the second feature map comprises comparing a first vector of the first feature map and a second vector of the second feature map to a codebook comprising an embedding space of codebook vectors to determine that the first vector and the second vector are both closest to a same codebook vector; and
indicating the shared document element via a user interface.
|