CPC G06F 40/58 (2020.01) [G06F 16/2468 (2019.01); G06F 40/274 (2020.01); G06F 40/55 (2020.01)] | 17 Claims |
1. A method comprising:
receiving an input in a source human language, wherein the input is a text string including one or more abbreviations resulting from a maximum character size imposed on an input field by a user interface;
providing, via an abbreviation mapping engine, the input to a human language translation service;
receiving from the human language translation service, a plurality of translation proposals in a target human language different from the source human language, wherein receiving the plurality of translation proposals further comprises reusing linguistic data that was previously gathered, and wherein the plurality of translation proposals are associated with a usage count and a plurality of similarity scores indicating non-exact matching of the text string with linguistic data stored in a database, and wherein the similarity scores are within a predefined range configured to produce full words and abbreviations in the plurality of translation proposals;
uploading the plurality of translation proposals to the abbreviation mapping engine;
applying, by the abbreviation mapping engine, a ruleset to the plurality of translation proposals to generate a plurality of abbreviation candidates from the plurality of translation proposals, the ruleset detecting one or more of: a camel case, a concluding period, and an occurrence of multiple consonants;
comparing the plurality of abbreviation candidates to translation proposals having the usage count greater than a threshold, wherein said comparing is based on an edit distance between the plurality of abbreviation and to translation proposals;
mapping one of the plurality of abbreviation candidates to a text string corresponding to an original full term based upon the edit distance and a length of one or more translation proposals having edit distances above a threshold; and
storing in a database of a non-transitory computer readable storage medium, a map between the one of the plurality of abbreviation candidates and the original full term.
|