CPC G06F 11/1004 (2013.01) [G06F 11/27 (2013.01); G06N 3/02 (2013.01)] | 17 Claims |
1. A method of end to end failure detection for use in a neural network (NN) processor configured to implement a target neural network, the method comprising:
providing a plurality of unallocated redundant hardware resources in said neural network processor, said unallocated redundant hardware resources including compute, memory, routing, and control elements;
allocating during an offline compilation process via a compiler a main computational path from said plurality of unallocated redundant hardware resources, said main computational path to be protected from end to end failures;
allocating during said offline compilation process via said compiler one or more redundant computational paths from said plurality of unallocated redundant hardware resources different from said plurality of unallocated redundant hardware resources allocated to said main computational path, where said compiler ensures that said main computational path and said one or more redundant computational paths go through and use different resources with no overlap of compute, memory, routing, and control hardware resource elements, wherein said main computational path and said one or more redundant computational paths each function to perform the same calculations for the same target neural network, said one or more redundant computational paths operative to protect said main computational path from end to end failures whereby whole or partial layers of a whole or partial network are allocated;
providing the same input data to said main computational path and said one or more redundant computational paths at the same time;
calculating cyclic redundancy code (CRC) checksums on tensor stream data output from said main computational path and said one or more redundant computational paths;
comparing said CRC checksums from said main computational path and said one or more redundant computational paths with each other;
detecting an error if said calculated CRC checksums do not match; and
wherein said main computational path and said one or more redundant computational paths use different data resources selected from a group consisting of stream managers (SMs), portions of L4 memory allocated to said stream managers, input buffers (IBs), portions of L3 memory, input aligners (IAs), subclusters (SCs), activation processing units (APUs), and output buffers (OBs).
|