| CPC G06F 16/35 (2019.01) [G06F 40/117 (2020.01)] | 9 Claims |

|
1. A method for extracting and analyzing content facing a large model, wherein the method comprises:
by preprocessing a current text generated by the large model, obtaining a plurality of word vectors of the current text;
by clustering the plurality of word vectors according to an importance degree of the plurality of word vectors, obtaining a plurality of word cluster centers; and
determining whether the current text is compliant based on the plurality of word cluster centers and a sensitive word database, wherein
the obtaining a plurality of word vectors of the current text comprises:
determining a paragraph tag of the word vector based on a paragraph position relationship of the current text;
determining a describing density of the word vector based on the paragraph tag;
determining a structure distribution parameter of the word vector; and
determining a text positioning coefficient of the word vector based on the describing density and the structure distribution parameter;
determining a similarity of any two of the plurality of word vectors based on a plurality of text positioning coefficients of the plurality of word vectors; and
by clustering the plurality of word vectors based on the similarity, obtaining the plurality of word cluster centers.
|