US 12,450,524 B2
	Machine learning system, machine learning method, and program
Kenta Niwa, Tokyo (JP); and Willem Bastiaan Kleijn, Wellington (NZ)
Assigned to NTT, Inc., Tokyo (JP); and VICTORIA UNIVERSITY OF WELLINGTON, Wellington (NZ)
Appl. No. 17/047,028
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP); and VICTORIA UNIVERSITY OF WELLINGTON, Wellington (NZ)
PCT Filed Apr. 12, 2019, PCT No. PCT/JP2019/015973 § 371(c)(1), (2) Date Oct. 12, 2020, PCT Pub. No. WO2019/198815, PCT Pub. Date Oct. 17, 2019.
Claims priority of application No. 2018-076814 (JP), filed on Apr. 12, 2018; application No. 2018-076815 (JP), filed on Apr. 12, 2018; application No. 2018-076816 (JP), filed on Apr. 12, 2018; application No. 2018-076817 (JP), filed on Apr. 12, 2018; and application No. 2018-202397 (JP), filed on Oct. 29, 2018.
Prior Publication US 2021/0158226 A1, May 27, 2021
Int. Cl. G06N 20/20 (2019.01); G06F 17/18 (2006.01)

CPC G06N 20/20 (2019.01) [G06F 17/18 (2013.01)]

1 Claim

1. A machine learning method comprising:

learning a deep neural network by performing:

a step in which a plurality of node portions learn mapping that uses one common primal variable by machine learning based on their respective input data while sending and receiving information to and from each other, wherein the plurality of node portions is a plurality of distributed servers,

the plurality of node portions perform the machine learning so as to minimize, instead of a cost function of a non-convex function originally corresponding to the machine learning, a proxy convex function serving as an upper bound on the cost function,

the proxy convex function is represented by a formula of a first-order gradient of the cost function with respect to a primal variable,

V is a predetermined positive integer equal to or greater than 2; the plurality of node portions are node portions 1, . . . , V, and a set of node portions is ˜V={1, . . . , V}; B is a predetermined positive integer, with b=1, . . . , B and a set of positive integers equal to or less than B is ˜B={1, . . . , B}; a set of node portions connected with a node portion i is ˜N(i); tis an integer, with t=0, . . . , T−1, where T is a positive integer; the bth vector constituting a dual variable λ_i|jat the node portion i with respect to a node portion j is λ_i|j,b; the dual variable λ_i|j,bafter the t+1th update is λ_i|j,b^(t+1); the bth vector constituting a dual auxiliary variable z_i|jof the dual variable λ_i|jis z_i|j,b; the dual auxiliary variable z_i|j,bafter the t+1th update is z_i|j,b(t+1); the bth vector constituting a dual auxiliary variable y_i|jof the dual variable is λ_i|jis y_i|j,b; the dual auxiliary variable y_i|j,bafter the t+1th update is y_i|j,b^(t+1); the bth element of a primal variable w_iof the node portion i is w_i,b; the dual auxiliary variable w_i,bafter the t+1th update is w_i,b^(t+1);

a cost function corresponding to the node portion i used in the machine learning is f_i; a first-order gradient of the cost function f_iwith respect to w_i,b^(t)is ∇f_i(w_i,b^(t)); I is an identity matrix;

O is a zero matrix; σ₁is a predetermined positive number; η is a positive number; and a matrix A_i|jat the node portion i with respect to a node portion j is defined by the formula below,

and

for t=0, . . . , T−1,

(a) a step in which the node portion i performs an update of the dual variable according to the formula below:

where A^Tdenotes the transpose matrix of a matrix A, and

(b) a step in which the node portion i performs an update of the primal variable according to the formula below:

for i∈˜V,b∈˜B, w_i,b^(t+1)=w_i,b^(t)−η(∀f_i(w_i,b^(t))+ΣA_i|j^Tλ_i|j,b^(t+1)j∈N(i)),

and

for some or all of t=0, . . . , T−1, the following are performed in addition to the step (a) and the step (b):

(c) a step in which, with i∈˜V, j∈˜N(i), and b∈˜B, at least one node portion i sends the dual auxiliary variable y_i|j,b^(t+1)to at least one node portion j, and

(d) a step in which, with i∈˜V, j∈˜N(i), and b∈˜B, the node portion i that has received a dual auxiliary variable y_j∈i,b^(t+1)sets z_i|j,b(t+1)=y_j|i,b^(t+1), and

performing machine learning wherein the machine learning is carried out even when the cost function is not a convex function, wherein

the primal variable is updated independently of the plurality of the node portions.