US 12,321,831 B1
Automated detection of content generated by artificial intelligence
Dmitriy Karpman, San Francisco, CA (US); Ryan Weber, San Francisco, CA (US); and Kevin Guo, San Francisco, CA (US)
Assigned to Castle Global, Inc., San Francisco, CA (US)
Filed by Castle Global, Inc., San Francisco, CA (US)
Filed on Jul. 25, 2023, as Appl. No. 18/358,823.
Int. Cl. G06N 20/00 (2019.01); G06N 5/04 (2023.01)
CPC G06N 20/00 (2019.01) [G06N 5/04 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A computing system comprising:
one or more processors; and
a set of memory resources storing a set of artificial intelligence (AI)-generated content classifiers and instructions that, when executed by the one or more processors, cause the computing system to:
aggregate a training dataset that includes examples, wherein first examples in the training dataset are labeled using a first label that indicates a respective first example is AI-generated, second examples are labeled using a second label that indicates a respective example is not AI-generated, and third examples are labeled using a third label that identifies a respective source of a particular generative AI model from a plurality of generative AI models that generated a respective third example;
generate a subset of the training dataset for a first source of a generative AI model from the third examples, wherein the subset of the training dataset includes a first subset of the third examples that is labeled with the third label for the first source of the generative AI model and a second subset of the third examples that is labeled with one or more sources other than the first source, wherein the second subset of the third examples is relabeled with a fourth label of not generated by the first source;
train parameters of AI-generated content classifiers in the set of AI-generated content classifiers based on learning correlations between features that distinguish between human-created content and artifacts associated with generative AI models in examples in a training dataset and a particular label, wherein a first AI-generated content classifier of the AI-generated content classifiers comprises a plurality of sub-classifiers that are associated with a respective generative AI model and trained using the third examples, and wherein a specific sub-classifier of the plurality of sub-classifiers for the first source of a generative AI model of the plurality of generative AI models is trained using the subset of the training dataset to learn correlations for features that are specialized to the first source of the generative AI model of the plurality of generative AI models based on the first subset of the third examples with the third label being positive examples of being AI generated by the first source of the generative AI model of the plurality of generative AI models and the second subset of the third examples with the fourth label being negative examples of not generated by the first source;
receive a classification request specifying query content;
execute the first AI-generated content classifier on the query content to compute a first set of confidence scores proportional to predicted probabilities that the query content includes content data produced by a generative AI model, wherein the first set of confidence scores is generated based on parameters learned by the first AI-generated content classifier during training, wherein a sub-classifier in the plurality of sub-classifiers outputs a sub-confidence score proportional to a probability that the query content includes content data produced by a respective generative AI model for the sub-classifier; and
output classification results that indicate whether the query content is AI-generated based on the first set of confidence scores and indicate whether the query content is generated by a particular generative AI model based on one or more of the sub-confidence scores output by the plurality of sub-classifiers.