CPC G06F 16/2282 (2019.01) [G06F 16/211 (2019.01); G06F 40/284 (2020.01)] | 20 Claims |
1. A computer-based method of linking tabular columns to column types in an ontology unseen during training, the method comprising:
for a target table, encoding a target tabular query column, table headers, and target types independently to generate permutation invariant representations of type data associated with a target ontology and tabular data associated with the target table, wherein encoding the target types further includes encoding associated auxiliary information, wherein the auxiliary information includes a partial taxonomy structure comprising linearized two-hop is-a ancestor labels;
processing the encoded tabular query column using a first transformer to obtain a first vector and the encoded table headers using a second transformer to obtain a second vector;
concatenating the first vector and the second vector to obtain a combined vector;
processing the combined vector through a linear layer and a Gaussian Error Linear Unit layer to obtain a final query vector;
processing the encoded target types through a third transformer to obtain a third vector; and
calculating a score for the target tabular query column as a dot product between the final query vector and the third vector to model interactions between the target tabular query column of the target table and the target types and provide a column-type annotation.
|