CPC G06F 11/1435 (2013.01) [G06F 11/0772 (2013.01); G06F 11/1464 (2013.01); G06F 11/1469 (2013.01); H04L 1/18 (2013.01); H04L 1/189 (2013.01); H04L 69/10 (2013.01); H04L 41/0654 (2013.01)] | 11 Claims |
1. A method of failure recovery in streaming data processing that provides an at-most-once service level, the method comprising:
a plurality of sending nodes contemporaneously streaming input data chunks to a first node;
the first node generating one or more output data chunks from the input data chunks from the plurality of sending nodes, and the first node streaming the one or more output data chunks to a second node; and,
the first node communicating with each of the plurality of sending nodes using a protocol in which transfer of each input data chunk from a respective sending node is concluded by a message exchange that commits the first node to perform failure recovery with respect to handling the just-transferred input data chunk, and releases the respective sending node therefrom;
wherein the first node begins sending the one or more output data chunks prior to receiving all of the input data chunks associated with the one or more output data chunks, and said failure recovery by the first node comprises:
upon a failure prior to a commitment for a given input data chunk from a given one of the plurality of sending nodes, the first node:
sending a failure message to the second node with respect to the one or more output data chunks,
discarding a received portion of the given input data chunk that failed,
re-generating the one or more output data chunks without the given input data chunk incorporated therein, based on a copy of the input data chunks other than the given input data chunk that failed, the copy being stored at the first node, and,
sending the re-generated one or more output data chunks to the second node.
|