CPC G06F 16/9538 (2019.01) [G06F 16/951 (2019.01); G06F 16/955 (2019.01); G06F 16/9537 (2019.01); G06F 40/30 (2020.01); G06F 2216/03 (2013.01)] | 17 Claims |
1. A data mining method comprising:
acquiring a current article to be mined;
obtaining information values required for each data identification strategy of a plurality of data identification strategies from the current article, wherein each data identification strategy is used for identifying a preset type of data;
identifying a data type of the current article according to the information values required for each data identification strategy to obtain a data type identification result; and
determining whether the current article belongs to any preset type of data according to the data type identification result;
wherein preset types of the data comprise low quality data, low quality content, and inaccurate sentiment analysis; and obtaining the information values required for each data identification strategy of the plurality of data identification strategies from the current article comprises:
obtaining an article title, an article abstract and an article content from the current article based on a data identification strategy of a low quality data type;
extracting keywords from the current article based on a data identification strategy of a low quality content type; and
obtaining a sentiment polarity label from the current article based on a data identification strategy of an inaccurate sentiment analysis.
|