US 12,332,944 B2
Identifying equivalent technical terms in different documents
June-Ray Lin, Taipei (TW); Nan Chen, Beijing (CN); Ju Ling Liu, Beijing (CN); Li Na Wang, Beijing (CN); and Shun Xian Wu, Beijing (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Apr. 14, 2021, as Appl. No. 17/230,475.
Prior Publication US 2022/0335090 A1, Oct. 20, 2022
Int. Cl. G06F 16/93 (2019.01); G06F 16/9032 (2019.01); G06F 16/9035 (2019.01); G06F 16/9038 (2019.01); G06F 18/214 (2023.01); G06N 3/08 (2023.01)
CPC G06F 16/93 (2019.01) [G06F 16/90332 (2019.01); G06F 16/9035 (2019.01); G06F 16/9038 (2019.01); G06F 18/214 (2023.01); G06N 3/08 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method for identifying equivalent technical terms, the method comprising:
training a deep learning model to identify equivalent technical terms using training data comprising structured and unstructured data comprising documents annotated with technical terms that are identified as being equivalent, wherein said equivalent technical terms correspond to technical terms having a same semantic meaning within a threshold degree, wherein said training of said deep learning model to identify equivalent technical terms comprises:
annotating technical terms in text of a first document regarding a first product and a second document regarding a second product with tags corresponding to matching technical terms listed in a data structure; and
annotating said tagged technical terms with entity types using an entity identification engine to assist said deep learning model in identifying equivalent technical terms, wherein said entity identification engine provides components for term or entity detection using maximum entropy models trained from annotated data, wherein said entity identification engine further provides a trainable co-reference component for grouping detected terms in a document that correspond to a same entity, and a trainable relation extraction system;
applying said deep learning model to a first document;
analyzing each sentence of said first document to identify technical terms;
analyzing text preceding or succeeding a first technical term identified in said first document to determine a meaning of said text preceding or succeeding said first technical term;
reviewing a glossary list to determine if said meaning of said analyzed text preceding or succeeding said first technical term in said first document matches a meaning in said glossary list linked to a second technical term, wherein said glossary list comprises a list of meanings associated with equivalent technical terms in designated products;
identifying said second technical term equivalent to said first technical term from said glossary list in response to determining said meaning of said analyzed text preceding or succeeding said first technical term matching said meaning in said glossary list linked to said second technical term; and
annotating said first document by replacing said first technical term with said second technical term or tagging said first technical term with said second technical term.