US 12,141,530 B2
Determining unknown concepts from surrounding context
Maulana Bachtiar, Singapore (SG); Thi Thanh Thao Lai, Singapore (SG); Wen Rui Siow, Singapore (SG); Yida Lee, Singapore (SG); Ronny Syarif, Singapore (SG); and Cheranellore Vasudevan, Bastrop, TX (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jun. 9, 2021, as Appl. No. 17/342,601.
Prior Publication US 2022/0398382 A1, Dec. 15, 2022
Int. Cl. G06F 40/30 (2020.01); G06F 16/93 (2019.01); G06F 40/289 (2020.01); G06N 5/048 (2023.01); G06N 20/00 (2019.01)
CPC G06F 40/30 (2020.01) [G06F 16/93 (2019.01); G06F 40/289 (2020.01); G06N 5/048 (2013.01); G06N 20/00 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method for learning unknown concepts during natural language processing, comprising:
identifying a sentence associated with an unknown concept from a first document;
selecting a first sequential set of sentences from the first document, wherein the first sequential set of sentences includes the sentence associated with the unknown concept, at least one sentence prior to the sentence associated with the unknown concept, and at least one sentence subsequent to the sentence associated with the unknown concept;
selecting a second sequential set of sentences from a second document, wherein the second sequential set of sentences includes a sentence associated with a known concept, at least one sentence prior to the sentence associated with the known concept, and at least one sentence subsequent to the sentence associated with the known concept;
based on a predetermined sliding window size for the first document and the second document, iteratively comparing concepts associated with the at least one sentence prior to and subsequent to the sentence associated with the unknown concept to concepts associated with at least one sentence prior to and subsequent to the sentence associated with the known concept, respectively;
determining, based on the comparison, whether an inference can be made between the unknown concept associated with the sentence from the first document and the sentence associated with the known concept associated with the sentence from the second document, wherein the inference can be made if an average degree of similarly between a first degree of similarity between the concepts associated with each of the at least one sentence prior to the sentence associated with the unknown concept and the sentence associated with the known concept and a second degree of similarity between the concept associated with each of the at least one sentence subsequent to the sentence associated with the unknown concept and the sentence associated with the known concept are above a predetermined threshold;
iteratively expanding, in response to determining that an inference cannot be made, a window size of the predetermined sliding window until a concept match is found for the unknown concept associated with the sentence from the first document, wherein the window size of the predetermined sliding window is expanded for the first sequential set of sentences to further include at least two sentences prior to and subsequent to the sentence associated with the unknown concept and for the second sequential set of sentences to further include at least two sentences prior to and subsequent the sentence associated with the known concept;
comparing concepts associated with the at least two sentences prior to and subsequent to the sentence associated with the unknown concept to concepts associated with the at least two sentences prior to and subsequent to the sentence associated with the known concept;
determining, based on the comparison, whether an inference can be made between the unknown concept associated with the sentence from the first document and the sentence associated with the known concept associated with the sentence from the second document, wherein the inference can be made if an average degree of similarity between each concept associated with the at least two sentences prior to or at least two sentences subsequent to the unknown concept in the first sequential set of sentences and each concept associated with the at least two sentences prior to or at least two sentences subsequent to the known concept in the second sequential set of sentences is above a threshold; and
tagging, in response to determining that an inference can be made, the unknown concept associated with the sentence from the first document with the known concept associated with the sentence from the second document.