US 11,914,596 B2
Parallelized parsing of data in cloud storage
Eric Eilebrecht, Bothell, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Mar. 30, 2022, as Appl. No. 17/657,270.
Application 17/657,270 is a continuation of application No. 16/458,312, filed on Jul. 1, 2019, granted, now 11,301,474.
Claims priority of provisional application 62/843,181, filed on May 3, 2019.
Prior Publication US 2022/0222258 A1, Jul. 14, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/20 (2019.01); G06F 16/2455 (2019.01); G06F 16/23 (2019.01)
CPC G06F 16/24553 (2019.01) [G06F 16/2379 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method of parsing data comprising a plurality of records, the method comprising:
storing a plurality of data chunks irrespective of boundaries of records in the data chunks;
assigning, for each data chunk of the plurality of data chunks, a worker to scan the data chunk;
receiving, from each worker, scan results comprising:
an indicator selected from the group consisting of number of instances of a context-varying symbol in the data chunk and a parity of the number of instances of the context-varying symbol in the data chunk,
a position in the data chunk of a first instance of a context-dependent symbol after an even number of instances of the context-varying symbol, the context-dependent symbol having a first, structural meaning in a first context and a second, non-structural meaning in a second context, and
a position in the data chunk of a first instance of the context-dependent symbol after an odd number of instances of the context-varying symbol;
computing an adjusted data chunk, comprising
locating a first record delimiter in a later data chunk based at least upon the indicator, the adjusted data chunk comprising data from the later data chunk to complete a partial record in the data chunk; and
after completing the partial record, parsing the adjusted data chunk and executing a query against the plurality of records.