| CPC G06F 40/30 (2020.01) [G06F 16/906 (2019.01); G06F 16/93 (2019.01); G06F 40/284 (2020.01); G06F 40/289 (2020.01); G06N 5/022 (2013.01)] | 19 Claims |

|
1. A data processing system comprising:
a processor; and
a machine-readable medium storing executable instructions that, when executed, cause the processor to perform operations comprising:
receiving a set of one or more unlabeled documents associated with one or more first categories of documents to be used to train machine learning models to analyze the set of one or more unlabeled documents; and
fine-tuning a first machine learning model and a second machine learning model based on the set of one or more unlabeled documents to enable the first machine learning model to determine a semantic representation of the one or more first categories of document, and to enable the second machine learning model to classify semantic representations according to the one or more first categories of documents, the first machine learning model and the second machine learning model having been trained using first unlabeled training data including a second plurality of categories of documents, the second plurality of categories of documents not including the one or more first categories of documents.
|