US 12,445,677 B2
Small and fast video processing networks via neural architecture search
Anthony J. Piergiovanni, Mountain View, CA (US); Anelia Angelova, Sunnyvale, CA (US); and Michael Sahngwon Ryoo, Rocky Point, NY (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/620,451
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Sep. 16, 2020, PCT No. PCT/US2020/051027
§ 371(c)(1), (2) Date Dec. 17, 2021,
PCT Pub. No. WO2021/055442, PCT Pub. Date Mar. 25, 2021.
Claims priority of provisional application 62/902,107, filed on Sep. 18, 2019.
Prior Publication US 2022/0366257 A1, Nov. 17, 2022
Int. Cl. G06N 3/04 (2023.01); G06F 18/21 (2023.01); G06N 3/063 (2023.01); G06N 3/082 (2023.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); H04N 21/44 (2011.01)
CPC H04N 21/44008 (2013.01) [G06F 18/217 (2023.01); G06N 3/04 (2013.01); G06N 3/063 (2013.01); G06N 3/082 (2013.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01)] 19 Claims
OG exemplary drawing
 
1. A computer-implemented method of neural architecture search for video understanding, the method comprising:
defining, by a computing system comprising one or more computing devices, a plurality of searchable parameters that control an architecture of a video processing neural network, wherein the video processing neural network is configured to process video data to produce inferences;
for one or more iterations:
determining, by the computing system, a new set of values for the plurality of searchable parameters to generate a new architecture for the video processing neural network;
determining, by the computing system, whether the video processing neural network with the new architecture satisfies one or more constraints on a size of the video processing neural network or a runtime of the video processing neural network;
when the video processing neural network with the new architecture does not satisfy the one or more constraints: discarding, by the computing system, the new architecture prior to completion of training of the video processing neural network with the new architecture; and
when the video processing neural network with the new architecture satisfies the one or more constraints:
training, by the computing system, the video processing neural network with the new architecture on a set of video training data; and
evaluating, by the computing system, one or more performance metrics for the video processing neural network relative to production of inferences for video evaluation data.