CPC G06F 16/3329 (2019.01) [G06F 16/35 (2019.01); G06F 40/295 (2020.01); G06F 40/30 (2020.01); G10L 15/1822 (2013.01); G10L 15/183 (2013.01); G10L 15/22 (2013.01)] | 20 Claims |
1. A processor implemented method, comprising:
receiving, by a technical question generation system (TQGS) via one or more hardware processors, (i) one or more technical documents, the one or more technical documents comprising text data associated with a technical domain, (ii) a name of the technical domain, and one or more non-domain specific technical terms and one or more seed terms associated with the technical domain, and (iii) a knowledge graph (KG) specific to the technical domain;
extracting, by the TQGS via the one or more hardware processors, structure information from the one or more technical documents to create a document structure graph (DSG);
creating, by the TQGS via the one or more hardware processors, a unified linguistic denotation graph (ULDG) from the one or more technical documents using one or more natural language processing algorithms;
identifying, by the TQGS via the one or more hardware processors, one or more domain specific technical terms using a keyword extraction algorithm and a domain specific term clustering algorithm based at least in part on the one or more non-domain specific technical terms and the one or more seed terms;
identifying, by the TQGS via the one or more hardware processors, a plurality of relationships, relation types and attributes of relations associated between the one or more domain specific technical terms present in the one or more technical documents using one or more open information extraction algorithms and one or more semantic role labelling algorithms to obtain additional information corresponding to the technical domain;
creating, by the TQGS via the one or more hardware processors, a concept graph (CG) by populating a concept graph data structure using the identified one or more domain specific technical terms and the additional information obtained corresponding to the technical domain;
identifying, by the TQGS via the one or more hardware processors, one or more graph motifs in one or more graphs using linguistic information associated with one or more graph elements of the corresponding one or more graphs, the one or more graphs comprising the KG, ULDG, DSG and CG, the one or more graph elements comprising one or more nodes and one or more edges connecting the one or more nodes of a graph of the one or more graphs; and
generating, by the TQGS via the one or more hardware processors, one or more technical questions based on the one or more graph motifs and the one or more graphs using a plurality of semantic templates.
|