US 11,899,681 B2
Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
Hui Li, Beijing (CN); and Lei Xu, Beijing (CN)
Assigned to BOE TECHNOLOGY GROUP CO., LTD., Beijing (CN)
Filed by BOE Technology Group Co., Ltd., Beijing (CN)
Filed on Jul. 31, 2020, as Appl. No. 16/944,521.
Claims priority of application No. 201910926347.1 (CN), filed on Sep. 27, 2019.
Prior Publication US 2021/0097089 A1, Apr. 1, 2021
Int. Cl. G06F 16/25 (2019.01); G06F 16/951 (2019.01); G06F 16/901 (2019.01); G06N 5/02 (2023.01)
CPC G06F 16/258 (2019.01) [G06F 16/9024 (2019.01); G06F 16/951 (2019.01); G06N 5/02 (2013.01)] 8 Claims
OG exemplary drawing
 
1. A method for building a knowledge graph, comprising steps of:
acquiring source data related to preset keywords according to the preset keywords;
cleaning the source data according to a preset data dictionary and an error information table;
extracting entities, attribute information of the entities and relationship information among the entities from the cleaned source data according to the preset data dictionary and an entity relationship;
fusing the entities, the attribute information of the entities and the relationship information among the entities to obtain data triples as the knowledge graph; and
storing the knowledge graph into a preset graph database,
wherein the keywords are keywords in a field of arts; the preset data dictionary is a data dictionary of arts; the error information table is an error information table related to the field of arts; and the preset entity relationship is a preset entity relationship among a painter, a painting and a museum,
wherein the source data comprises semi-structured source data and structured source data; the acquiring source data related to preset keywords according to the preset keywords in the field of arts comprises steps of:
crawling the semi-structured source data on a preset target website related to the field of arts by using a scrapy application framework according to the keywords;
and/or,
retrieving the structured source data in a preset database related to the field of arts according to the keywords, wherein the structured source data is represented in a unified two-dimensional form, and has the following characteristics: data has a row as a unit, one row of data represents information of one entity, and each row of data has a same attribute,
wherein in response to the source data comprising the semi-structured source data, before cleaning the source data according to a preset data dictionary of arts and an error information table related to the field of arts, the method further comprises preprocessing the semi-structured source data to obtain the structured source data, comprising steps of:
dividing the semi-structured source data into a plurality of groups according to preset attribute information;
obtaining, based on a Word2vec algorithm, similarity vectors corresponding to data in the semi-structured source data in the plurality of groups;
obtaining a similarity between any two data in a same group based on the similarity vectors;
comparing the similarity with a preset similarity threshold;
in response to the similarity exceeding the preset similarity threshold, fusing the two data into a piece of semi-structured source data; and
for the fused semi-structured source data, extracting corresponding data from the source data to form the structured source data,
wherein the step of cleaning the source data according to a preset data dictionary of arts and an error information table related to the field of arts comprises steps of:
processing a single-valued attribute in the source data by using the error information table to replace an error value in the single-valued attribute with a correct value in the error information table, wherein the error information table comprises a combination of several sets of error information and correct information; and the single-valued attribute refers to an attribute having only one value; the data dictionary of arts is updated in real time;
inquiring entity attribute information and relationship information corresponding to the source data from the preset data dictionary of arts and a relationship table according to the single-valued attribute;
looking through the source data in the error information table; and
in response to the error information in the error information table not containing the source data, of which the single-valued attribute is required to be replaced, outputting the entity attribute information and the relationship information corresponding to the source data, wherein when the error information in the error information table contains the source data of the single-valued attribute for replacement, the single-value attribute in the source data is continuously replaced until the error information in the error information table does not contain the source data of the single-valued attribute for replacement.