| CPC G16H 50/70 (2018.01) [G16H 10/20 (2018.01)] | 20 Claims |

|
1. A method performed by one or more computers, the method comprising:
obtaining a set of input text sequences;
generating a collection of structured data records from the set of input text sequences using an extraction neural network, wherein each structured data record defines a structured representation of a corresponding input text sequence with reference to a predefined schema of semantic categories, and wherein generating each structured data record comprises:
processing an input text sequence using the extraction neural network to generate an output text sequence that defines a corresponding structured data record, comprising, for each position in the output text sequence:
processing a sequence of embeddings representing the input text sequence and any part of the output text sequence preceding the position in the output text sequence in accordance with trained values of a set of extraction neural network parameters to generate a score distribution over a set of tokens; and
selecting a token, in accordance with the score distribution over the set of tokens, to occupy the position in the output text sequence;
wherein the extraction neural network has been trained by a machine learning training technique to perform a natural language understanding task;
filtering the collection of structured data records to identify and remove structured data records that are predicted to be unreliable; and
processing the collection of structured data records to generate an article that is directed to a selected topic and that aggregates information from across multiple structured data records.
|