CPC G06F 40/35 (2020.01) [G06F 40/295 (2020.01); G06N 20/00 (2019.01); G06F 3/0482 (2013.01)] | 20 Claims |
1. A computer-implemented method for generating a multi-modal discourse tree, the computer-implemented method comprising:
obtaining a corpus of text and one or more data records that are separate from the corpus of text;
generating an extended discourse tree for the corpus of text, the extended discourse tree comprises a plurality of discourse trees, each discourse tree comprising a plurality of nodes, each terminal node of the discourse tree corresponding to a fragment of text, each non-terminal node of the discourse tree indicating a rhetorical relationship between nodes of the discourse tree, the extended discourse tree comprising additional links between the plurality of discourse trees indicating additional rhetorical relationships between nodes of respective discourse trees;
identifying entity matches between a set of elementary discourse units of the plurality of discourse trees and the one or more data records, the entity matches being identified by comparing a first entity identified from an elementary discourse unit to a second entity identified from a data record of the one or more data records;
identifying one or more causal links between two data records of the one or more data records;
determining a corresponding rhetorical relationship for each entity match and each of the one or more causal links identified;
generating, for the extended discourse tree, respective nodes for each entity match and for each causal link identified; and
linking the respective nodes generated for each entity match and for each causal link to a respective node of the extended discourse tree based at least in part on the corresponding rhetorical relationship determined, thereby creating the multi-modal discourse tree.
|