| CPC G06F 40/30 (2020.01) | 23 Claims |

|
1. A method for processing natural language text, said text being converted into suitable structured, annotated data, the method comprising:
a) receiving said text and segmenting said text into individual terms;
b) determining for each one of said individual terms one or more semantic categories using a semantic category lexicon, wherein some of said terms belong to more than one semantic category for a same part of speech;
c) using a category association table to assign one of said one or more semantic categories to said individual terms;
d) organizing ones of said individual terms that are linked together into expressions, said expressions having an assigned semantic category;
e) standardizing terms or expressions with established terms or expressions found, at least, in other information blocks;
f) segmenting said text into said information blocks using semantic categories indicative of information block breaks;
g) identifying and adding hierarchical information to said information blocks; and
h) processing said information blocks to determine at least one information block specific attribute from a combination of term values, categories, expressions and hierarchical information of said information blocks.
|