CPC G06F 40/30 (2020.01) [G06F 16/358 (2019.01); G06F 40/237 (2020.01); G06F 40/279 (2020.01); G06N 20/00 (2019.01)] | 33 Claims |
1. A computer-implemented method for embedding a portion of text describing a relationship for one or more entities of interest, the method comprising:
receiving a portion of text comprising data representative of a relationship for the one or more entities of interest, wherein the portion of text comprises multiple separable entities including one or more relationship entities and the one or more entities of interest;
for each of the multiple separable entities, generating a set of embeddings by (a) retrieving, from an embedding vocabulary dataset, one or more embeddings of entities associated with the separable entity and (b) forming a set of embeddings associated with the separable entity based on the retrieved one or more embeddings, wherein each set of embeddings comprises an embedding of the separable entity and at least one embedding of an entity associated with the separable entity;
sending at least one embedding from each of the sets of embeddings for input to a machine learning model or classifier; and
storing the generated sets of embeddings in the embedding vocabulary dataset, wherein the embedding vocabulary dataset comprises data representative of one or more entities mapped to one or more corresponding embeddings,
wherein,
retrieving from the embedding vocabulary dataset one or more embeddings of entities associated with a separable entity further comprises (a) determining whether an embedding corresponding to each of the separable entity and the one or more entities associated with the separable entity exists in the embedding vocabulary dataset, (b) retrieving those embeddings associated with the separable entity that exist in the embedding vocabulary dataset, (c) generating out-of-vocabulary embeddings for those embeddings associated with the separable entity that are not found in the embedding vocabulary dataset, and (d) generating a set of embeddings for the separable entity based on at least any retrieved embedding or any generated out-of-vocabulary embedding.
|