| CPC G06F 16/215 (2019.01) [G06F 16/2365 (2019.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
obtaining a set of data annotation pairs, wherein each of the data annotation pairs comprises an input data annotation in a first format and a corresponding output data annotation in a second format;
determining, within at least a portion of the data annotation pairs, one or more non-diffs;
determining, across the at least a portion of the data annotation pairs, one or more data annotation properties associated with multiple intents by processing at least a portion of the one or more non-diffs using one or more regular expression learning-based clustering algorithms to group instances of the one or more non-diffs within the at least a portion of the data annotation pairs on a basis of at least one of (i) one or more repeating characters within the one or more non-diffs, (ii) non-diff positioning, and (iii) one or more matching words within the one or more non-diffs;
modifying at least a portion of the data annotation pairs based at least in part on the one or more identified data annotation properties;
outputting the modified data annotation pairs to at least one user; and
generating a final collection of data annotation pairs by processing at least a portion of the modified data annotation pairs and user feedback received in response to the outputting of the modified data annotation pairs;
wherein the method is carried out by at least one computing device.
|