US 12,354,015 B2
Processing of neural networks
Philip Gregory Hall, Bracknell (GB); and Jacob Bohlin, Lund (SE)
Assigned to Arm Limited, Cambridge (GB)
Filed by Arm Limited, Cambridge (GB)
Filed on Sep. 13, 2021, as Appl. No. 17/473,616.
Claims priority of provisional application 63/194,003, filed on May 27, 2021.
Prior Publication US 2022/0383133 A1, Dec. 1, 2022
Int. Cl. G06N 3/10 (2006.01)
CPC G06N 3/10 (2013.01) 15 Claims
OG exemplary drawing
 
1. A method, performed by an information processing system comprising a first information processing apparatus and a second information processing apparatus comprising a storage and a processor, for reducing storage usage during processing of a neural network performed by the second information processing apparatus, wherein the neural network may be represented by a plurality of operators each of which operates on an input feature map and generates an output feature map, the method comprising:
generating, by the first information processing apparatus, a representation of the neural network as a linear sequence of operators;
identifying, by the first information processing apparatus, operators in the linear sequence of operators that cannot form part of a cascade and are to be processed with the entire input feature map and output feature map of the respective operator in the storage;
forming, by the first information processing apparatus, one or more cascades of two or more successive operators in the linear sequence for which the input feature map of each operator of the cascade is processed in portions, which portions are less than the entire input feature map,
wherein the method forms the one or more cascades by sequentially, from one end of the linear sequence of operators, designating each operator that could form part of a cascade as a member of a cascade or as an operator to be processed with the entire input feature map and output feature map of the operator in the storage;
generating, by the first information processing apparatus, instructions for executing the neural network including the designations of the operators;
executing, by the processor of the second information processing apparatus, the instructions so that operators designated to be processed with the entire feature map are processed with their respective feature map loaded completely in the storage; and
sequentially executing, by the processor of the second information processing apparatus, the two or more operators that are designated as part of a cascade with a portion of the entire feature map that is less than the entire feature map loaded in the storage before processing a further portion of the entire feature map.