US 12,242,796 B2
Permutation invariance for representing linearized tabular data
Sarthak Dash, Jersey City, NY (US); Sugato Bagchi, White Plains, NY (US); Nandana Mihindukulasooriya, Cambridge, MA (US); and Alfio Massimiliano Gliozzo, Brooklyn, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jun. 17, 2022, as Appl. No. 17/807,461.
Prior Publication US 2023/0409806 A1, Dec. 21, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 40/157 (2020.01); G06F 40/284 (2020.01)
CPC G06F 40/157 (2020.01) [G06F 40/284 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A computer-based method of encoding tabular data with permutation invariance, the method comprising:
receiving input including tabular data and linearizing a column or row within the received tabular data;
automatically assigning an increasing sequence of position identifiers to each non-delimiting tokenized cell in the linearized column or row until a header delimiter is reached;
in response to reaching the header delimiter, automatically assigning a monotonically increasing sequence of position identifiers for each non-delimiting tokenized cell positioned after the header delimiter, restarting from an integer corresponding to 1 greater than the position identifier assigned to the header delimiter for each non-delimiting tokenized cell positioned after cell delimiters;
automatically assigning a static position identifier for each of the cell delimiters in the linearized column or row, the static position identifier being 1 greater than a highest position identifier assigned to the non-delimiting tokenized cells; and
automatically outputting an encoded permutation-invariant representation of the linearized column or row.