US 12,086,565 B2
Meaning and sense preserving textual encoding and embedding
Tanveer Syeda-Mahmood, Cupertino, CA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Feb. 28, 2022, as Appl. No. 17/682,028.
Prior Publication US 2023/0274098 A1, Aug. 31, 2023
Int. Cl. G06F 40/56 (2020.01); G06F 40/247 (2020.01); G06F 40/30 (2020.01)
CPC G06F 40/56 (2020.01) [G06F 40/247 (2020.01); G06F 40/30 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method, in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions that are executed by the at least on processor to cause the at least one processor to be specifically configured to implement a text encoder that encodes natural language text data input at least by:
training, by a contrastive machine learning training operation, an encoder of a machine learning computer model, to learn a sense and similarity preserving embedding, wherein the sense and similarity preserving embedding operates to encode input natural language text data to generate encoded natural language text data based on a sense attribute of one or more terms in the input natural language text data, wherein each encoding of the one or more terms in the encoded natural language text data is a tuple data structure having at least one value in the tuple data structure specifying the sense attribute, from one or more predetermined possible sense attributes, of a corresponding term in the one or more terms, and wherein the contrastive machine learning training operation operates to learn to separate positive samples in training data from negative samples in the training data;
processing, by the trained encoder computer model, a first term specified in an input natural language text to generate an encoded natural language text based on the learned sense and similarity preserving embedding; and
inputting, to a downstream computing system, the encoded natural language text, to cause the downstream computing system to perform a computer natural language processing operation on the encoded natural language text data based on the sense and similarity preserving embedding.