US 12,067,476 B2
	Mixture of experts neural networks
Noam M. Shazeer, Palo Alto, CA (US); Azalia Mirhoseini, Mountain View, CA (US); and Krzysztof Stanislaw Maziarz, Jaslo (PL)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 8, 2023, as Appl. No. 18/244,171.
Application 18/244,171 is a continuation of application No. 16/879,187, filed on May 20, 2020, granted, now 11,790,214.
Application 16/879,187 is a continuation of application No. 16/393,063, filed on Apr. 24, 2019, granted, now 10,719,761, issued on Jul. 21, 2020.
Application 16/393,063 is a continuation of application No. PCT/US2017/059909, filed on Nov. 3, 2017.
Claims priority of provisional application 62/432,497, filed on Dec. 9, 2016.
Claims priority of provisional application 62/418,135, filed on Nov. 4, 2016.
Prior Publication US 2023/0419079 A1, Dec. 28, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/045 (2023.01); G06N 3/08 (2023.01)

CPC G06N 3/045 (2023.01) [G06N 3/08 (2013.01)]

18 Claims

1. A system comprising:

a main neural network implemented by one or more computers, the main neural network comprising a Mixture of Experts (MoE) subnetwork that comprises:

a plurality of expert neural networks, wherein the main neural network is configured to receive an input text sequence as input and to process the input text sequence to generate a network output, wherein the input text sequence has respective text located at a plurality of corresponding positions, and

a gating subsystem configured to:

for each position in the input text sequence, select a respective combination of one or more expert neural networks from the plurality of expert neural networks to be active for the processing of the text located at the position by the main neural network, wherein the gating subsystem is configured to select the respective combination of the one or more expert neural networks based on at least one of a syntax or semantics of the text located at the position.