CPC G16H 10/60 (2018.01) [G06F 16/3344 (2019.01); G06F 40/279 (2020.01); G06T 11/206 (2013.01)] | 20 Claims |
1. A computing system comprising one or more processors and memory including program code, the memory and the program code configured to, with the one or more processors, cause the computing system to:
retrieve a plurality of natural language data objects from a database;
determine, based at least in part on the plurality of natural language data objects and by utilizing an entity extraction machine learning model, a plurality of entity identifiers for the plurality of natural language data objects, wherein: (i) the entity extraction machine learning model comprises an encoder sub-model and an entity classification sub-model, (ii) the encoder sub-model is configured to generate a plurality of text embeddings based at least in part on the plurality of natural language data objects, (iii) the entity classification sub-model is configured to determine an entity classification for each text embedding, and (iv) the plurality of entity identifiers are determined based at least in part on each entity classification;
determine, based at least in part on the plurality of entity identifiers and by utilizing the entity extraction machine learning model, one or more entity relationship identifiers for the plurality of natural language data objects, wherein: (i) the entity extraction machine learning model comprises an entity relationship classification sub-model, (ii) the entity relationship classification sub-model is configured to determine an entity relationship classification for each entity pair from the plurality of entity identifiers based at least in part on a subset of the plurality of text embeddings that corresponds to the entity pair, and (iii) the one or more entity relationship identifiers are determined based at least in part on each entity relationship classification;
generate, based at least in part on the plurality of entity identifiers and the one or more entity relationship identifiers, a graph-based data object that encodes the plurality of natural language data objects in accordance with a set of input formatting requirements for a data prediction machine learning model; and
initiate the performance of the data prediction machine learning model to generate at least one prediction data object based at least in part on the graph-based data object.
|