US 12,314,661 B2
Natural language detection
Michael Zatsepin, Novokuznetsk (RU)
Assigned to ABBYY Development Inc., Dover, DE (US)
Filed by ABBYY Development Inc., Dover, DE (US)
Filed on Dec. 16, 2022, as Appl. No. 18/082,919.
Prior Publication US 2024/0202444 A1, Jun. 20, 2024
Int. Cl. G06F 40/284 (2020.01); G06F 40/263 (2020.01); G06V 30/148 (2022.01)
CPC G06F 40/284 (2020.01) [G06F 40/263 (2020.01); G06V 30/153 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
identifying, by a processing device, a document comprising a plurality of words in one or more natural languages;
for each word of at least a subset of words of the document:
generating a plurality of sets of tokens representing the word, wherein each set of tokens of the plurality of sets of tokens represents the word using a corresponding plurality of tokens defined for a corresponding natural language of a set of natural languages, and
identifying, based on the plurality of sets of tokens, a primary natural language associated with the word;
associating each natural language of the set of natural languages with a corresponding word count indicating a number of words of the subset of words for which the natural language has been identified as the primary natural language;
identifying, among the set of natural languages, a natural language associated with a maximum word count; and
associating the identified natural language with the document.