US 12,293,843 B1
Generating, filtering, and combing structured data records using machine learning
Zachary Michael Ziegler, Cambridge, MA (US); Jonas Sebastian Wulff, Glendale, CA (US); Evan Hernandez, Wimauma, FL (US); and Daniel Joseph Nadler, Nassau (BS)
Assigned to Xyla Inc., Wilmington, DE (US)
Filed by Xyla Inc., Wilmington, DE (US)
Filed on Aug. 26, 2024, as Appl. No. 18/815,494.
Application 18/815,494 is a continuation of application No. 18/814,294, filed on Aug. 23, 2024.
Application 18/815,494 is a continuation of application No. 18/812,375, filed on Aug. 22, 2024.
Application 18/815,494 is a continuation of application No. 18/810,153, filed on Aug. 20, 2024, granted, now 12,243,653.
Application 18/815,494 is a continuation of application No. 18/810,328, filed on Aug. 20, 2024, granted, now 12,249,430.
Application 18/815,494 is a continuation of application No. 18/219,027, filed on Jul. 6, 2023.
Claims priority of provisional application 63/368,434, filed on Jul. 14, 2022.
This patent is subject to a terminal disclaimer.
Int. Cl. G16H 50/70 (2018.01); G16H 10/20 (2018.01)
CPC G16H 50/70 (2018.01) [G16H 10/20 (2018.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by one or more computers, the method comprising:
obtaining a set of input text sequences;
generating a collection of structured data records from the set of input text sequences using an extraction neural network, wherein each structured data record defines a structured representation of a corresponding input text sequence with reference to a predefined schema of semantic categories, and wherein generating each structured data record comprises:
processing an input text sequence using the extraction neural network to generate an output text sequence that defines a corresponding structured data record, comprising, for each position in the output text sequence:
processing a sequence of embeddings representing the input text sequence and any part of the output text sequence preceding the position in the output text sequence in accordance with trained values of a set of extraction neural network parameters to generate a score distribution over a set of tokens; and
selecting a token, in accordance with the score distribution over the set of tokens, to occupy the position in the output text sequence;
wherein the extraction neural network has been trained by a machine learning training technique to perform a natural language understanding task;
filtering the collection of structured data records to identify and remove structured data records that are predicted to be unreliable; and
processing the collection of structured data records to generate an article that is directed to a selected topic and that aggregates information from across multiple structured data records.