US 11,755,603 B1
Searching compression profiles for trained neural networks
Ragav Venkatesan, Seattle, WA (US); Gurumurthy Swaminathan, Redmond, WA (US); Xiong Zhou, Bothell, WA (US); Runfei Luo, Kirkland, WA (US); and Vineet Khare, Redmond, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Mar. 26, 2020, as Appl. No. 16/831,584.
Int. Cl. G06F 16/24 (2019.01); G06F 16/2458 (2019.01); G06F 16/25 (2019.01); G06F 16/248 (2019.01); G06N 3/04 (2023.01); H03M 7/30 (2006.01); G06N 3/082 (2023.01); G06Q 10/10 (2023.01)
CPC G06F 16/2474 (2019.01) [G06F 16/248 (2019.01); G06F 16/252 (2019.01); G06N 3/04 (2013.01); G06N 3/082 (2013.01); H03M 7/30 (2013.01); G06Q 10/10 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
at least one processor; and
a memory, storing program instructions that when executed cause the at least one processor to implement a compression profile search system, the compression profile search system configured to:
receive a request for a compression profile search for one or more neural networks trained according to one or more respective data sets;
responsive to the request:
until a search criteria for the compression profile is satisfied, the compression profile search system is configured to iteratively perform:
generate a plurality of different prospective compression profiles for the one or more trained neural networks according to a search policy;
cause performance of respective versions of the one or more trained neural networks that are compressed according to the different prospective compression profiles using the respective one or more data sets;
determine respective performance metrics for the respective versions of the trained neural network that are compressed according to the different prospective compression profiles; and
update the search policy for a subsequent iteration to generate prospective compression profiles according to an evaluation of the respective performance metrics for the respective versions of the trained neural network; and
return one or more of the plurality of different prospective compression profiles generated according to the updated search policy of a last iteration of the compression profile search system in response to the request.