US 12,443,643 B1
	Methods, devices, and systems for extracting and analyzing content facing large model
Haiyang Wang, Yantai (CN); Zhihua Yu, Yantai (CN); Qirong Zhou, Yantai (CN); Qiang Qiu, Yantai (CN); Jie Chu, Yantai (CN); Shenqiang Wang, Yantai (CN); and Zhen Zhao, Yantai (CN)
Assigned to INSTITUTE OF NETWORK TECHNOLOGY (YANTAI), Yantai (CN)
Filed by INSTITUTE OF NETWORK TECHNOLOGY (YANTAI), Shandong (CN)
Filed on Apr. 28, 2025, as Appl. No. 19/192,328.
Claims priority of application No. 202510088849.7 (CN), filed on Jan. 21, 2025.
Int. Cl. G06F 16/35 (2025.01); G06F 40/117 (2020.01)

CPC G06F 16/35 (2019.01) [G06F 40/117 (2020.01)]

9 Claims

1. A method for extracting and analyzing content facing a large model, wherein the method comprises:

by preprocessing a current text generated by the large model, obtaining a plurality of word vectors of the current text;

by clustering the plurality of word vectors according to an importance degree of the plurality of word vectors, obtaining a plurality of word cluster centers; and

determining whether the current text is compliant based on the plurality of word cluster centers and a sensitive word database, wherein

the obtaining a plurality of word vectors of the current text comprises:

determining a paragraph tag of the word vector based on a paragraph position relationship of the current text;

determining a describing density of the word vector based on the paragraph tag;

determining a structure distribution parameter of the word vector; and

determining a text positioning coefficient of the word vector based on the describing density and the structure distribution parameter;

determining a similarity of any two of the plurality of word vectors based on a plurality of text positioning coefficients of the plurality of word vectors; and

by clustering the plurality of word vectors based on the similarity, obtaining the plurality of word cluster centers.