US 11,755,839 B2
Low resource named entity recognition for sensitive personal information
Youngja Park, Princeton, NJ (US); and Jatin Arora, Urbana, IL (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 19, 2021, as Appl. No. 17/324,212.
Prior Publication US 2022/0374602 A1, Nov. 24, 2022
Int. Cl. G06F 17/00 (2019.01); G06F 40/295 (2020.01); G06F 40/284 (2020.01); G06N 20/00 (2019.01); G06F 40/166 (2020.01); G06F 40/30 (2020.01)
CPC G06F 40/295 (2020.01) [G06F 40/166 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, in a natural language processing (NLP) computing system comprising a named entity recognition (NER) computer model augmented to operate on an entity pattern embedding input feature in addition to one or more other embedding input features, the method comprising:
tokenizing natural language content to generate one or more tokens, wherein each token represents a subset of text in the natural language content;
processing a selected token, in the one or more tokens, in accordance with a predetermined entity pattern embedding technique to generate an entity pattern embedding input feature for the selected token, wherein the entity pattern embedding input feature specifies a pattern of characters present in the selected token;
processing the natural language content to generate the one or more other embedding input features in accordance with one or more other embedding techniques;
processing, by the NER computer model, the one or more other embedding input features and the entity pattern embedding input feature for the selected token to generate a predicted tag for the selected token, wherein the predicted tag specifies a named entity type classification for the selected token; and
performing, by the NLP computing system, an operation based on the predicted tag.