US 12,314,838 B2
	Image processing neural networks with separable convolutional layers
Francois Chollet, Mountain View, CA (US); and Andrew Gerald Howard, Culver City, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 2, 2024, as Appl. No. 18/431,300.
Application 18/431,300 is a continuation of application No. 18/114,333, filed on Feb. 27, 2023, granted, now 11,922,288.
Application 18/114,333 is a continuation of application No. 16/338,963, granted, now 11,593,614, issued on Feb. 28, 2023, previously published as PCT/US2017/055581, filed on Oct. 6, 2017.
Claims priority of provisional application 62/405,181, filed on Oct. 6, 2016.
Prior Publication US 2024/0256833 A1, Aug. 1, 2024
Int. Cl. G06N 3/04 (2023.01); G06F 18/2413 (2023.01); G06N 3/045 (2023.01); G06N 3/0464 (2023.01); G06N 3/08 (2023.01); G06V 10/44 (2022.01); G06V 10/82 (2022.01); G06V 40/16 (2022.01)

CPC G06N 3/045 (2023.01) [G06F 18/2413 (2023.01); G06N 3/0464 (2023.01); G06N 3/08 (2013.01); G06V 10/44 (2022.01); G06V 10/454 (2022.01); G06V 10/82 (2022.01); G06V 40/169 (2022.01)]

20 Claims

1. A neural network system implemented by one or more computers, wherein the neural network system is configured to receive an input image and to generate a network output for the input image, and wherein the neural network system comprises:

an input subnetwork configured to receive the input image and to process the input image to generate an initial output;

one or more entry modules, wherein the entry modules are configured to receive the initial output and to collectively process the initial output to generate an entry output, and wherein each entry module comprises:

a respective first pass-through convolutional layer configured to process a module input for the entry module to generate a first pass-through output,

a respective first stack of separable convolutional neural network layers, wherein the layers in the first stack are configured to collectively process the module input to generate a first stack output,

a respective max pooling layer configured to perform max pooling on the first stack output to generate a max pooled output, and

a respective first concatenation layer configured to concatenate the first pass-through output and the max pooled output to generate an entry module output for the entry module;

one or more middle modules, wherein the middle modules are configured to receive the entry output and to collectively process the entry output to generate a middle output, wherein each middle module comprises:

a respective second pass-through convolutional layer configured to process a module input for the middle module to generate a second pass-through output,

a respective second stack of separable convolutional neural network layers, wherein the layers in the second stack are configured to collectively process the module input to generate a second stack output, and

a respective second concatenation layer configured to concatenate the second pass-through output and the second stack output to generate a middle module output for the middle module; and

an exit module, wherein the exit module is configured to receive the middle output and to process the middle output to generate a separable convolution output for the separable convolution subnetwork, wherein the exit module comprises:

a third pass-through convolutional layer configured to process the middle output to generate a third pass-through output;

a third stack of separable convolutional neural network layers, wherein the layers in the third stack are configured to collectively process the middle output to generate a third stack output;

a third max pooling layer configured to perform max pooling on the third stack output to generate a third max pooled output; and

a third concatenation layer configured to concatenate the third pass-through output and the third pooled output to generate a concatenated output,

wherein each separable convolutional neural network layer is configured to:

separately apply both a depthwise convolution and a pointwise convolution during processing of an input to the separable convolutional neural network layer to generate a layer output.