CPC G06F 40/289 (2020.01) [G06F 40/205 (2020.01); G06F 40/211 (2020.01); G06F 40/253 (2020.01); G06N 5/01 (2023.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] | 25 Claims |
1. A computer-implemented method for dependency tree-based data augmentation for sentence well-formedness judgement, the method comprising:
applying a dependency parser to a set of sentences;
counting frequencies of respective dependency relations;
choosing a predetermined number of most frequent dependency tags;
determining an average depth of each of the most frequent dependency tags;
re-ranking the most frequent dependency tags according to average depths of respective ones of the most frequent dependency tags;
assigning predetermined rating scores to the most frequent dependency tags according to the average depths;
for each of the predetermined rating scores, constructing a dependency tag probability distribution;
applying a dependency parser to generate a first dependency tree for a first sentence;
sampling tokens to be removed from the first dependency tree according to a removal ratio for a corresponding predetermined rating score and the dependency tag probability distribution;
removing one or more nodes in the dependency tree according to the removal ratio;
generating, from the dependency tree, a partial tree for the sentence;
outputting a rated sentence based on the partial tree; and
wherein the rated sentence is used as training data.
|