| CPC G06V 30/413 (2022.01) [G06V 30/147 (2022.01); G06V 30/19113 (2022.01); G06V 30/19147 (2022.01); G06V 30/414 (2022.01); G06V 2201/09 (2022.01)] | 22 Claims |

|
1. A non-transitory computer readable medium comprising instructions which, when executed by one or more hardware processors, causes performance of operations comprising:
training a first machine learning model to predict suppliers associated with image-type documents at least by:
obtaining a plurality of training data sets, a training data set of the plurality of training data sets comprising:
a feature vector representing an image-type document specifying at least one of goods and services and being associated with a supplier; and
a label identifying the supplier associated with the image-type document;
training the first machine learning model based on the plurality of training data sets to generate a first trained machine learning model;
receiving, by a content extraction platform, a target image-type document;
extracting a set of feature values from the target image-type document;
based on the set of feature values extracted from the target image-type document: generating a target feature vector representing the target image-type document;
applying the first trained machine learning model to the target feature vector to predict a particular supplier associated with the target image-type document;
based on the first trained machine learning model predicting the particular supplier:
identifying attributes, from a plurality of attributes, that correspond to the particular supplier; and
extracting, by the content extraction platform from the target image-type document, a first set of attribute values associated with the attributes that correspond to the particular supplier; and
storing, by the content extraction platform, the first set of attribute values in association with the attributes.
|