US 12,008,029 B1
Delimiter determination in input data
Sung Jin Kim, Buena Park, CA (US); Yinuo Zhang, Los Angeles, CA (US); Rehana Rahiman, San Diego, CA (US); and Eugene Szedenits, Ypsilanti, MI (US)
Assigned to Teradata US, Inc., San Diego, CA (US)
Filed by TERADATA US, INC., San Diego, CA (US)
Filed on Dec. 29, 2022, as Appl. No. 18/147,851.
Int. Cl. G06F 16/00 (2019.01); G06F 16/38 (2019.01)
CPC G06F 16/38 (2019.01) 20 Claims
OG exemplary drawing
 
1. A non-transitory machine-readable storage medium comprising instructions that upon execution cause a system to:
determine whether input data comprises binary data or text data; and
in response to determining that the input data comprises the text data, perform a delimiter identification process comprising:
identifying candidate record delimiters and candidate field delimiters in the input data;
providing different pairs of candidate record delimiters and candidate field delimiters, wherein each respective pair of the different pairs comprises a corresponding candidate record delimiter and a corresponding field delimiter,
for each respective pair of the different pairs:
identifying records using the corresponding candidate record delimiter of the respective pair, and
computing a first measure indicating a quantity of unique fields observed in the records identified using the corresponding field delimiter of the respective pair; and
selecting, based on values of the first measure computed for corresponding pairs of the different pairs, a record delimiter and a field delimiter in a pair of the different pairs.