US 11,734,514 B1
Automated translation of subject matter specific documents
Gary Shorter, Danbury, CT (US); Naouel Baili Ben Abdallah, Danbury, CT (US); and Barry Ahrens, Danbury, CT (US)
Assigned to IQVIA INC., Durham, NC (US)
Filed by IQVIA Inc., Danbury, CT (US)
Filed on Nov. 16, 2020, as Appl. No. 17/98,812.
Application 17/098,812 is a continuation of application No. 16/276,002, filed on Feb. 14, 2019, granted, now 10,839,164.
Claims priority of provisional application 62/739,541, filed on Oct. 1, 2018.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/00 (2020.01); G06F 40/30 (2020.01); G16H 10/20 (2018.01); G06N 3/04 (2023.01); G06F 40/295 (2020.01); G06F 40/253 (2020.01); G06F 40/284 (2020.01); G06N 3/08 (2023.01)
CPC G06F 40/30 (2020.01) [G06F 40/253 (2020.01); G06F 40/284 (2020.01); G06F 40/295 (2020.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G16H 10/20 (2018.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
splitting sentences in a digitized text of a stored document into segments;
ordering words in the segmented sentences having reduced complexity relative to the sentences prior to splitting;
at least partially translating the segments to a target natural language by matching the ordered segments to segments in a database of documents previously translated from a source natural language, wherein content of the documents have similar subject matter as the new document;
producing a single representation of the sentences that share a common meaning by applying transformational grammar to the digitized text; and
outputting a representation of the stored document that includes a semantic meaning in the target natural language.
 
17. A computer-implemented method comprising:
splitting the digitized text in a new document into segments by identifying sentence boundaries using a gazetteer list of abbreviations to identify sentence marking stops;
identifying, using named entity recognition, the digitized text that is excluded from translation to a target natural language;
searching, using fuzzy matching, a translation history from the source natural language to the target natural language, for the segments between and existing translations;
identifying and tagging parts of speech in the digitized text;
grammatically transforming the digitized text to provide a single representation of sentences that have a common meaning;
over an application programming interface (API):
transmitting the segments to an external translation engine for translation;
receiving a translation of the segments in the target natural language from the external translation engine;
correcting the translation for subject matter specific acronyms and/or subject matter specific terminology; and
reconstructing the new document using the corrected translation in the target natural language.