US 12,001,957 B2
	Methods and systems for neural architecture search
Yassine Benyahia, Bern (CH); Kamil Bennani-Smires, Bern (CH); Michael Baeriswyl, Bern (CH); and Claudiu Musat, Bern (CH)
Assigned to SWISSCOM AG, Bern (CH)
Filed by Swisscom AG, Bern (CH)
Filed on Sep. 27, 2019, as Appl. No. 16/586,363.
Claims priority of application No. 18197366 (EP), filed on Sep. 27, 2018; and application No. 19154841 (EP), filed on Jan. 31, 2019.
Prior Publication US 2020/0104688 A1, Apr. 2, 2020
Int. Cl. G06N 3/086 (2023.01); G06F 18/24 (2023.01); G06F 40/20 (2020.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01)

CPC G06N 3/086 (2013.01) [G06F 18/24 (2023.01); G06F 40/20 (2020.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01)]

14 Claims

1. A method comprising:

in a system comprising one or more processing circuits:

determining, by the one or more processing circuits, a preferred model for performing a selected task, wherein the determining comprises:

obtaining a computational graph comprising: a plurality of nodes connected by a plurality of edges, and a plurality of weightings configured to scale input data provided to nodes along edges, wherein each node is configured to:

receive at least one item of input data from a preceding node connected to the node via an edge;

perform an operation on the input data to provide output data, wherein each item of input data is scaled according to a weighting associated with the node and/or edge; and

provide the output data to a subsequent node via an edge in the graph;

wherein the computational graph defines a first model and a second model, each model being a subgraph in the computational graph that has a selection of the plurality of nodes, edges and associated weightings, wherein some of the selected nodes, edges and weightings are shared between both the first model and the second model;

updating the weightings of the first model based on training the first model to perform the selected task;

training the second model using a training loss function which includes a component indicative of a measure for weightings in the second model, wherein the training loss function comprises a scaling factor configured to inhibit an exploding gradient associated with any of the weightings in the second model;

updating the weightings of the second model based on training the second model to perform the same selected task as the first model, wherein:

updating the weightings of the second model comprises updating some of the weightings updated in step which are shared between the first and second models, and

updating the shared weighting is controlled based on an indication of importance for the trained first model a node and/or edge associated with the weighting; and

identifying the preferred model, the identifying comprising a selection of nodes, edges and associated weightings from the computational graph, wherein the preferred model is identified based on an analysis of the first and second trained models;

configuring, by the one or more processing circuits, a neural network to perform the selected task based on the preferred model; and

performing, by the one or more processing circuits, the selected task using the neural network, wherein the selected task comprises one or more of natural language processing, image recognition, classification and/or modelling of physical systems, data processing, and generation of search results.