US 12,443,643 B1
Methods, devices, and systems for extracting and analyzing content facing large model
Haiyang Wang, Yantai (CN); Zhihua Yu, Yantai (CN); Qirong Zhou, Yantai (CN); Qiang Qiu, Yantai (CN); Jie Chu, Yantai (CN); Shenqiang Wang, Yantai (CN); and Zhen Zhao, Yantai (CN)
Assigned to INSTITUTE OF NETWORK TECHNOLOGY (YANTAI), Yantai (CN)
Filed by INSTITUTE OF NETWORK TECHNOLOGY (YANTAI), Shandong (CN)
Filed on Apr. 28, 2025, as Appl. No. 19/192,328.
Claims priority of application No. 202510088849.7 (CN), filed on Jan. 21, 2025.
Int. Cl. G06F 16/35 (2025.01); G06F 40/117 (2020.01)
CPC G06F 16/35 (2019.01) [G06F 40/117 (2020.01)] 9 Claims
OG exemplary drawing
 
1. A method for extracting and analyzing content facing a large model, wherein the method comprises:
by preprocessing a current text generated by the large model, obtaining a plurality of word vectors of the current text;
by clustering the plurality of word vectors according to an importance degree of the plurality of word vectors, obtaining a plurality of word cluster centers; and
determining whether the current text is compliant based on the plurality of word cluster centers and a sensitive word database, wherein
the obtaining a plurality of word vectors of the current text comprises:
determining a paragraph tag of the word vector based on a paragraph position relationship of the current text;
determining a describing density of the word vector based on the paragraph tag;
determining a structure distribution parameter of the word vector; and
determining a text positioning coefficient of the word vector based on the describing density and the structure distribution parameter;
determining a similarity of any two of the plurality of word vectors based on a plurality of text positioning coefficients of the plurality of word vectors; and
by clustering the plurality of word vectors based on the similarity, obtaining the plurality of word cluster centers.