US 12,430,938 B2
	Predicting missing entity identities in image-type documents
Vikram Majjiga Reddy, Cupertino, CA (US)
Assigned to Oracle International Corporation, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Jun. 15, 2023, as Appl. No. 18/335,845.
Prior Publication US 2024/0420497 A1, Dec. 19, 2024
Int. Cl. G06V 30/413 (2022.01); G06V 30/146 (2022.01); G06V 30/19 (2022.01); G06V 30/414 (2022.01)

CPC G06V 30/413 (2022.01) [G06V 30/147 (2022.01); G06V 30/19113 (2022.01); G06V 30/19147 (2022.01); G06V 30/414 (2022.01); G06V 2201/09 (2022.01)]

22 Claims

1. A non-transitory computer readable medium comprising instructions which, when executed by one or more hardware processors, causes performance of operations comprising:

training a first machine learning model to predict suppliers associated with image-type documents at least by:

obtaining a plurality of training data sets, a training data set of the plurality of training data sets comprising:

a feature vector representing an image-type document specifying at least one of goods and services and being associated with a supplier; and

a label identifying the supplier associated with the image-type document;

training the first machine learning model based on the plurality of training data sets to generate a first trained machine learning model;

receiving, by a content extraction platform, a target image-type document;

extracting a set of feature values from the target image-type document;

based on the set of feature values extracted from the target image-type document: generating a target feature vector representing the target image-type document;

applying the first trained machine learning model to the target feature vector to predict a particular supplier associated with the target image-type document;

based on the first trained machine learning model predicting the particular supplier:

identifying attributes, from a plurality of attributes, that correspond to the particular supplier; and

extracting, by the content extraction platform from the target image-type document, a first set of attribute values associated with the attributes that correspond to the particular supplier; and

storing, by the content extraction platform, the first set of attribute values in association with the attributes.