US 12,079,726 B2
	Probabilistic neural network architecture generation
Nicolo Fusi, Watertown, MA (US); Francesco Paolo Casale, Boston, MA (US); and Jonathan Gordon, Cambridge (GB)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Feb. 9, 2023, as Appl. No. 18/107,612.
Application 18/107,612 is a continuation of application No. 16/179,433, filed on Nov. 2, 2018, granted, now 11,604,992.
Prior Publication US 2023/0186094 A1, Jun. 15, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/082 (2023.01); G06F 18/21 (2023.01); G06F 18/214 (2023.01); G06N 3/047 (2023.01); G06N 3/08 (2023.01)

CPC G06N 3/082 (2013.01) [G06F 18/2148 (2023.01); G06F 18/217 (2023.01); G06N 3/047 (2023.01); G06N 3/08 (2013.01)]

20 Claims

1. A system comprising:

at least one processor; and

memory storing instructions that, when executed by the at least one processor, causes the system to perform a set of operations, the set of operations comprising:

iteratively tuning a probability distribution associated with a neural network architecture parameter for generating a neural network, the iteratively tuning comprising:

generating a sampled neural network architecture using the probability distribution;

evaluating training data from a training data store using the sampled neural network architecture to compute a gradient of a loss function associated with the sampled neural network architecture;

updating the probability distribution for the neural network architecture parameter based on the computed gradient of the loss function, thereby generating an updated iteration of the probability distribution for a subsequent iteration of tuning the probability distribution; and

evaluating the probability distribution based on termination criteria to determine whether the termination criteria is satisfied; and

when it is determined that the termination criteria is satisfied, generating a result neural network architecture having a value for the parameter based on the iteratively tuned probability distribution.