CPC G06F 16/38 (2019.01) | 20 Claims |
1. A non-transitory machine-readable storage medium comprising instructions that upon execution cause a system to:
determine whether input data comprises binary data or text data; and
in response to determining that the input data comprises the text data, perform a delimiter identification process comprising:
identifying candidate record delimiters and candidate field delimiters in the input data;
providing different pairs of candidate record delimiters and candidate field delimiters, wherein each respective pair of the different pairs comprises a corresponding candidate record delimiter and a corresponding field delimiter,
for each respective pair of the different pairs:
identifying records using the corresponding candidate record delimiter of the respective pair, and
computing a first measure indicating a quantity of unique fields observed in the records identified using the corresponding field delimiter of the respective pair; and
selecting, based on values of the first measure computed for corresponding pairs of the different pairs, a record delimiter and a field delimiter in a pair of the different pairs.
|