US 12,216,635 B2
Linking tabular columns to unseen ontologies
Sarthak Dash, Jersey City, NJ (US); Sugato Bagchi, White Plains, NY (US); Nandana Sampath Mihindukulasooriya, Dublin (IE); and Alfio Massimiliano Gliozzo, Brooklyn, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jun. 6, 2023, as Appl. No. 18/330,320.
Prior Publication US 2024/0411741 A1, Dec. 12, 2024
Int. Cl. G06F 16/20 (2019.01); G06F 16/21 (2019.01); G06F 16/22 (2019.01); G06F 40/284 (2020.01)
CPC G06F 16/2282 (2019.01) [G06F 16/211 (2019.01); G06F 40/284 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A computer-based method of linking tabular columns to column types in an ontology unseen during training, the method comprising:
for a target table, encoding a target tabular query column, table headers, and target types independently to generate permutation invariant representations of type data associated with a target ontology and tabular data associated with the target table, wherein encoding the target types further includes encoding associated auxiliary information, wherein the auxiliary information includes a partial taxonomy structure comprising linearized two-hop is-a ancestor labels;
processing the encoded tabular query column using a first transformer to obtain a first vector and the encoded table headers using a second transformer to obtain a second vector;
concatenating the first vector and the second vector to obtain a combined vector;
processing the combined vector through a linear layer and a Gaussian Error Linear Unit layer to obtain a final query vector;
processing the encoded target types through a third transformer to obtain a third vector; and
calculating a score for the target tabular query column as a dot product between the final query vector and the third vector to model interactions between the target tabular query column of the target table and the target types and provide a column-type annotation.