US 12,093,649 B2
Dependency tree-based data augmentation for sentence well-formedness judgement
Yang Zhao, Tokyo (JP); Masayasu Muraoka, Tokyo (JP); and Issei Yoshida, Tokyo (JP)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 26, 2021, as Appl. No. 17/303,278.
Prior Publication US 2022/0382981 A1, Dec. 1, 2022
Int. Cl. G06F 40/289 (2020.01); G06F 40/205 (2020.01); G06F 40/211 (2020.01); G06F 40/253 (2020.01); G06N 5/01 (2023.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)
CPC G06F 40/289 (2020.01) [G06F 40/205 (2020.01); G06F 40/211 (2020.01); G06F 40/253 (2020.01); G06N 5/01 (2023.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] 25 Claims
OG exemplary drawing
 
1. A computer-implemented method for dependency tree-based data augmentation for sentence well-formedness judgement, the method comprising:
applying a dependency parser to a set of sentences;
counting frequencies of respective dependency relations;
choosing a predetermined number of most frequent dependency tags;
determining an average depth of each of the most frequent dependency tags;
re-ranking the most frequent dependency tags according to average depths of respective ones of the most frequent dependency tags;
assigning predetermined rating scores to the most frequent dependency tags according to the average depths;
for each of the predetermined rating scores, constructing a dependency tag probability distribution;
applying a dependency parser to generate a first dependency tree for a first sentence;
sampling tokens to be removed from the first dependency tree according to a removal ratio for a corresponding predetermined rating score and the dependency tag probability distribution;
removing one or more nodes in the dependency tree according to the removal ratio;
generating, from the dependency tree, a partial tree for the sentence;
outputting a rated sentence based on the partial tree; and
wherein the rated sentence is used as training data.