US 12,283,085 B2
Data labeling method based on artificial intelligence, apparatus and storage medium
Siqi Xu, Beijing (CN); Ke Sun, Beijing (CN); Jian Gong, Beijing (CN); Xu Pan, Beijing (CN); Zhiqun Xia, Beijing (CN); Zhe Yang, Beijing (CN); and Zecheng Zhuo, Beijing (CN)
Assigned to Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
Filed by Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
Filed on Sep. 2, 2022, as Appl. No. 17/902,323.
Claims priority of application No. 202210335852.0 (CN), filed on Mar. 31, 2022.
Prior Publication US 2023/0316709 A1, Oct. 5, 2023
Int. Cl. G06V 10/762 (2022.01); G06F 16/28 (2019.01); G06V 10/74 (2022.01); G06V 10/764 (2022.01)
CPC G06V 10/762 (2022.01) [G06F 16/285 (2019.01); G06V 10/761 (2022.01); G06V 10/764 (2022.01)] 17 Claims
OG exemplary drawing
 
1. A data labeling method based on artificial intelligence, comprising:
determining a plurality of samples involved in clustering;
performing a plurality of following operations circularly to realize iterative processing, until a convergence condition is satisfied, or a quantity of iterations reaches a number threshold, comprising:
pre-clustering the plurality of samples involved in clustering, according to a vector representation of the respective samples involved in clustering, to obtain a plurality of class clusters, wherein each class cluster contains at least one sample involved in clustering;
receiving labeling information for the respective class clusters, wherein the labeling information for the respective class clusters comprises: at least one sub-cluster contained in the respective class clusters, and a representative sample in each sub-cluster, wherein the sub-cluster comprises one representative sample and at least one non-representative sample;
re-determining the plurality of samples involved in clustering, according to the labeling information by: taking the representative sample in the sub-cluster in the labeling information for the respective class clusters, as the re-determined plurality of samples involved in clustering;
for the representative sample, determining a non-representative sample that belongs to, in a previous iteration process, a same sub-cluster as the representative sample; and
determining a sub-cluster to which the non-representative sample belongs in a current iteration process, to be the same as a sub-cluster to which the representative sample belongs in the current iteration process; and
determining a clustering result according to the labeling information for the respective class clusters.