US 12,487,976 B2
Automatically improving data annotations by processing annotation properties and user feedback
Shanmukha Chaitanya Guttula, Bengaluru (IN); Nitin Gupta, New Delhi (IN); Pranay Kumar Lohia, Bangalore (IN); and Hima Patel, Bengaluru (IN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Oct. 6, 2021, as Appl. No. 17/494,987.
Prior Publication US 2023/0106490 A1, Apr. 6, 2023
Int. Cl. G06F 16/215 (2019.01); G06F 16/23 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/2365 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
obtaining a set of data annotation pairs, wherein each of the data annotation pairs comprises an input data annotation in a first format and a corresponding output data annotation in a second format;
determining, within at least a portion of the data annotation pairs, one or more non-diffs;
determining, across the at least a portion of the data annotation pairs, one or more data annotation properties associated with multiple intents by processing at least a portion of the one or more non-diffs using one or more regular expression learning-based clustering algorithms to group instances of the one or more non-diffs within the at least a portion of the data annotation pairs on a basis of at least one of (i) one or more repeating characters within the one or more non-diffs, (ii) non-diff positioning, and (iii) one or more matching words within the one or more non-diffs;
modifying at least a portion of the data annotation pairs based at least in part on the one or more identified data annotation properties;
outputting the modified data annotation pairs to at least one user; and
generating a final collection of data annotation pairs by processing at least a portion of the modified data annotation pairs and user feedback received in response to the outputting of the modified data annotation pairs;
wherein the method is carried out by at least one computing device.