CPC G06F 16/2282 (2019.01) [G06F 16/211 (2019.01); G06F 16/285 (2019.01)] | 12 Claims |
1. A computer implemented method for automated column type annotation, wherein the method maps each column contained in a table to a column annotation class of a set of column annotation classes, wherein each column contains a header cell and a set of body cells,
comprising the following operations, wherein the operations are performed by components, and wherein the components are hardware components and/or software components executed by one or more processors:
transforming, by a pre-processor, the table into a numerical tensor representation by outputting a sequence of cell tokens for each cell in the table,
encoding, by a table encoder, the sequences of cell tokens and a column annotation label for each column into body cell embeddings, wherein at least one of the column annotation labels indicates a correct column annotation class for the respective column and at least one of the column annotation labels indicates that the column annotation class for the respective column is unknown,
processing, by a body pooling component, the body cell embeddings to provide column representations,
classifying, by a classifier, the column representations in order to provide for each column, confidence scores for each column annotation class,
comparing the highest confidence score for each column with a threshold, and
if the highest confidence score for each column is above the threshold, annotating each column with the respective column annotation class.
|