US 12,223,424 B2
	Method for differentiable architecture search based on a hierarchical grouping mechanism
Jiancheng Lv, Chengdu (CN); Chuan Liu, Chengdu (CN); Qing Ye, Chengdu (CN); Yanan Sun, Chengdu (CN); and Zhenan He, Chengdu (CN)
Assigned to Sichuan University, Chengdu (CN)
Filed by Sichuan University, Chengdu (CN)
Filed on Jan. 18, 2021, as Appl. No. 17/151,222.
Claims priority of application No. 202010055469.0 (CN), filed on Jan. 17, 2020.
Prior Publication US 2021/0224650 A1, Jul. 22, 2021
Int. Cl. G06N 3/08 (2023.01); G06F 16/901 (2019.01); G06F 16/906 (2019.01)

CPC G06N 3/08 (2013.01) [G06F 16/9024 (2019.01); G06F 16/9027 (2019.01); G06F 16/906 (2019.01)]

8 Claims

1. A method for a differentiable architecture search based on a hierarchical grouping mechanism, comprising:

S1: obtaining a target dataset to be subjected to a network architecture search;

S2: selecting a set number of normal cells and two reduction cells, wherein operations of each cell of the set number of normal cells and the two reduction cells form a directed acyclic graph; enabling the two reduction cells to be located at positions numbered by rounding down ⅓ and ⅔ of the set number of normal cells and the two reduction cells, respectively, and then concatenating the set number of normal cells and the two reduction cells to form an initial search network; wherein

edges of the directed acyclic graph of each cell of the normal cells and the two reduction cells are formed by mixing a plurality of inter-group operations, and each inter-group operation of the plurality of inter-group operations is formed by mixing a plurality of intra-group operations;

the plurality of inter-group operations comprise zero operations, a separable convolution group, a dilated convolution group, skip-connect, and a pooling group; and

the plurality of intra-group operations comprise convolutions and pooling:

S3: using training samples in the target dataset as an input of the initial search network, training the initial search network to optimize a cost function to complete a one-level search, wherein, control weight parameters are shared by the plurality of inter-group operations and are shared by the plurality of intra-group operations among the set number of normal cells and the two reduction cells; and obtaining normal cells and reduction cells based on the one-level search;

step S3 further comprising the initial search network of the one-level search is trained, sorting the control weight parameters of the plurality of inter-group operations in the each cell in descending order; for each node in the directed acyclic graph, retaining top two inter-group operations from different nodes among non-zero inter-group operations connected to previous nodes; sorting the control weight parameters of intra-group operations of the top two inter-group operations retained for the each cell in descending order, and retaining an intra-group operation with a largest control weight parameter among the plurality of inter-group operations;

S4: constructing a target network using the normal cells and the reduction cells obtained from the one-level searchby the following steps:

S42: constructing the target network with a number of the normal cells and the reduction cells of the target network using the normal cells and the reduction cells obtained from the one-level search;

S3: using training samples in the target dataset as an input of the initial search network, training the initial search network to optimize a cost function to complete a two-level search, wherein, the control weight parameters are shared by the plurality of inter-group operations and are not shared by the plurality of intra-group operations among the set number of normal cells and the two reduction cells, obtaining normal cells and reduction cells based on the two-level search; and

S4: constructing a target network using the normal cells and the reduction cells obtained from the two-level search by the following steps:

S41: constructing a training network in a form of the target network by using the normal cells and the reduction cells obtained from the two-level search according to the target dataset, iteratively training the initial search network until a preset number of iterations is reached to obtain an order of the control weight parameters of the plurality of intra-group operations of the each cell, and deleting an intra-group operation corresponding to a smallest control weight parameter to obtain the target network.