US 12,283,124 B2
Method of training text quality assessment model and method of determining text quality
Heng Wang, Beijing (CN); Zhenlei Tian, Beijing (CN); and Tianbao Yu, Beijing (CN)
Assigned to Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
Appl. No. 17/995,283
Filed by Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
PCT Filed Mar. 22, 2022, PCT No. PCT/CN2022/082273
§ 371(c)(1), (2) Date Sep. 30, 2022,
PCT Pub. No. WO2023/284327, PCT Pub. Date Jan. 19, 2023.
Claims priority of application No. 202110787492.3 (CN), filed on Jul. 12, 2021.
Prior Publication US 2024/0221404 A1, Jul. 4, 2024
Int. Cl. G06V 30/12 (2022.01); G06V 30/19 (2022.01)
CPC G06V 30/133 (2022.01) [G06V 30/1912 (2022.01)] 13 Claims
OG exemplary drawing
 
1. A method of training a text quality assessment model, comprising:
determining a first text satisfying a condition of being a negative sample and a second text satisfying a condition of being a positive sample from a plurality of texts based on indicators for the plurality of texts;
for any text of the first text and the second text, adding a label to the text based on the condition satisfied by the text, wherein the label indicates a category of the text, and the category comprises a low-quality category for the negative sample and a non-low-quality category for the positive sample; and
constituting a training set by the first text having the label and the second text having the label, to train the text quality assessment model,
wherein the text quality assessment model comprises a semantic representation network and a fully connected layer, the semantic representation network is configured to extract a semantic feature, the fully connected layer is configured to map the semantic feature to a category-dimensional space and output a classification prediction result, and before training the text quality assessment model, the method further comprises:
training the semantic representation network based on the plurality of texts, to obtain a pre-trained semantic representation network; and
obtaining the text quality assessment model by splicing the fully connected layer in an output direction of the pre-trained semantic representation network,
wherein the method further comprises: after training the text quality assessment model,
retraining the trained text quality assessment model by using a target text as a training sample, wherein the target text has a manual annotation label indicating a true category of the text.