US 12,412,411 B2
	Training of machine learning models using content masking techniques
Sohyeong Kim, Berlin (DE)
Assigned to SAP SE, Walldorf (DE)
Filed by SAP SE, Walldorf (DE)
Filed on Apr. 5, 2023, as Appl. No. 18/130,955.
Prior Publication US 2024/0338957 A1, Oct. 10, 2024
Int. Cl. G06V 30/19 (2022.01); G06N 3/045 (2023.01); G06N 3/09 (2023.01); G06V 10/82 (2022.01); G06V 30/14 (2022.01)

CPC G06V 30/19147 (2022.01) [G06N 3/045 (2023.01); G06N 3/09 (2023.01); G06V 10/82 (2022.01); G06V 30/1448 (2022.01); G06V 30/19007 (2022.01)]

20 Claims

1. A computer-implemented method for training a machine learning model comprising a first neural network and a second neural network, the computer-implemented method comprising:

extracting, from at least one document, texts and locations of the texts in the at least one document;

generating, using the first neural network, embeddings for the at least one document, a first set of the embeddings characterizing a first subset of the texts and locations of the first subset of the texts and a second set of the embeddings characterizing a second subset of the texts that are masked and additional locations of the second subset of the texts that are masked;

generating, using the second neural network, additional embeddings characterizing contents of the second subset of the texts;

comparing each of the additional embeddings with each of the second set of embeddings;

generating relevance values based on the comparison, the relevance values representing relevances of the additional embeddings to each of the second set of embeddings characterizing the second subset of the texts that are masked;

identifying, for each of the additional locations that are masked, a respective content of the second subset of the texts having a relevance value that is higher than the remaining relevance values; and

outputting each of the respective content of the second subset of the texts having the value that is higher than the remaining relevance values.