US 12,111,797 B1
Schema inference system
Kirsten Rae Lum, Issaquah, WA (US); Wing Yew Lum, Issaquah, WA (US); and Christopher John Gutierrez, Taos, NM (US)
Assigned to Storytellers.ai LLC, Issaquah, WA (US)
Filed by Storytellers.ai LLC, Issaquah, WA (US)
Filed on Sep. 22, 2023, as Appl. No. 18/371,931.
Int. Cl. G06F 15/16 (2006.01); G06F 16/21 (2019.01); G06F 16/22 (2019.01); G06F 16/2453 (2019.01); G06F 16/2455 (2019.01)
CPC G06F 16/211 (2019.01) [G06F 16/221 (2019.01); G06F 16/24544 (2019.01); G06F 16/2456 (2019.01)] 27 Claims
OG exemplary drawing
 
1. A method for managing data in a network using one or more processors to execute instructions that are configured to cause actions, comprising:
employing raw data to determine one or more tables with one or more columns, wherein the raw data includes metadata and is organized in the one or more columns of one or more tables;
employing the raw data to determine one or more concrete data types that correspond to the raw data organized in the one or more columns of the one or more tables;
determining one or more functional data types for the one or more columns of the one or more tables based on the raw data and correspondence with the one or more concrete data types, wherein a portion of the one or more columns are associated with an identifier data type;
determining one or more existing relationships associated with the one or more tables based on one or more of a metric or a statistical feature of the metadata for the raw data;
determining one or more inferred relationships associated with the one or more tables based on the portion of the one or more columns associated with the identifier data type and one or more common values associated with the one or more functional data types for the one or more columns;
employing one or more large language models to predict one or more inferred relationships associated with the one or more tables, wherein the one or more large language models are trained by one or more semantic relationship evaluators to generate the one or more predicted inferred relationships between the one or more tables;
executing one or more join expressions on two or more predicted inferred relationships for validation, wherein each predicted inferred relationship associated with one or more join execution errors is invalidated; and
generating a schema representing the raw data based on the one or more existing relationships, the one or more inferred relationships, each predicted inferred relationship that is validated, and the one or more tables.