CPC G06F 16/24553 (2019.01) [G06F 16/2379 (2019.01)] | 20 Claims |
1. A method of parsing data comprising a plurality of records, the method comprising:
storing a plurality of data chunks irrespective of boundaries of records in the data chunks;
assigning, for each data chunk of the plurality of data chunks, a worker to scan the data chunk;
receiving, from each worker, scan results comprising:
an indicator selected from the group consisting of number of instances of a context-varying symbol in the data chunk and a parity of the number of instances of the context-varying symbol in the data chunk,
a position in the data chunk of a first instance of a context-dependent symbol after an even number of instances of the context-varying symbol, the context-dependent symbol having a first, structural meaning in a first context and a second, non-structural meaning in a second context, and
a position in the data chunk of a first instance of the context-dependent symbol after an odd number of instances of the context-varying symbol;
computing an adjusted data chunk, comprising
locating a first record delimiter in a later data chunk based at least upon the indicator, the adjusted data chunk comprising data from the later data chunk to complete a partial record in the data chunk; and
after completing the partial record, parsing the adjusted data chunk and executing a query against the plurality of records.
|