US 12,450,524 B2
Machine learning system, machine learning method, and program
Kenta Niwa, Tokyo (JP); and Willem Bastiaan Kleijn, Wellington (NZ)
Assigned to NTT, Inc., Tokyo (JP); and VICTORIA UNIVERSITY OF WELLINGTON, Wellington (NZ)
Appl. No. 17/047,028
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP); and VICTORIA UNIVERSITY OF WELLINGTON, Wellington (NZ)
PCT Filed Apr. 12, 2019, PCT No. PCT/JP2019/015973
§ 371(c)(1), (2) Date Oct. 12, 2020,
PCT Pub. No. WO2019/198815, PCT Pub. Date Oct. 17, 2019.
Claims priority of application No. 2018-076814 (JP), filed on Apr. 12, 2018; application No. 2018-076815 (JP), filed on Apr. 12, 2018; application No. 2018-076816 (JP), filed on Apr. 12, 2018; application No. 2018-076817 (JP), filed on Apr. 12, 2018; and application No. 2018-202397 (JP), filed on Oct. 29, 2018.
Prior Publication US 2021/0158226 A1, May 27, 2021
Int. Cl. G06N 20/20 (2019.01); G06F 17/18 (2006.01)
CPC G06N 20/20 (2019.01) [G06F 17/18 (2013.01)] 1 Claim
OG exemplary drawing
 
1. A machine learning method comprising:
learning a deep neural network by performing:
a step in which a plurality of node portions learn mapping that uses one common primal variable by machine learning based on their respective input data while sending and receiving information to and from each other, wherein the plurality of node portions is a plurality of distributed servers,
the plurality of node portions perform the machine learning so as to minimize, instead of a cost function of a non-convex function originally corresponding to the machine learning, a proxy convex function serving as an upper bound on the cost function,
the proxy convex function is represented by a formula of a first-order gradient of the cost function with respect to a primal variable,
V is a predetermined positive integer equal to or greater than 2; the plurality of node portions are node portions 1, . . . , V, and a set of node portions is ˜V={1, . . . , V}; B is a predetermined positive integer, with b=1, . . . , B and a set of positive integers equal to or less than B is ˜B={1, . . . , B}; a set of node portions connected with a node portion i is ˜N(i); tis an integer, with t=0, . . . , T−1, where T is a positive integer; the bth vector constituting a dual variable λi|j at the node portion i with respect to a node portion j is λi|j,b; the dual variable λi|j,b after the t+1th update is λi|j,b(t+1); the bth vector constituting a dual auxiliary variable zi|j of the dual variable λi|j is zi|j,b; the dual auxiliary variable zi|j,b after the t+1th update is zi|j,b (t+1); the bth vector constituting a dual auxiliary variable yi|j of the dual variable is λi|j is yi|j,b; the dual auxiliary variable yi|j,b after the t+1th update is yi|j,b(t+1); the bth element of a primal variable wi of the node portion i is wi,b; the dual auxiliary variable wi,b after the t+1th update is wi,b(t+1);
a cost function corresponding to the node portion i used in the machine learning is fi; a first-order gradient of the cost function fi with respect to wi,b(t) is ∇fi(wi,b(t)); I is an identity matrix;
O is a zero matrix; σ1 is a predetermined positive number; η is a positive number; and a matrix Ai|j at the node portion i with respect to a node portion j is defined by the formula below,

OG Complex Work Unit Math
and
for t=0, . . . , T−1,
(a) a step in which the node portion i performs an update of the dual variable according to the formula below:

OG Complex Work Unit Math
where AT denotes the transpose matrix of a matrix A, and
(b) a step in which the node portion i performs an update of the primal variable according to the formula below:
for i∈˜V,b∈˜B, wi,b(t+1)=wi,b(t)−η(∀fi(wi,b(t))+ΣAi|jTλi|j,b(t+1)j∈N(i)),
and
for some or all of t=0, . . . , T−1, the following are performed in addition to the step (a) and the step (b):
(c) a step in which, with i∈˜V, j∈˜N(i), and b∈˜B, at least one node portion i sends the dual auxiliary variable yi|j,b(t+1) to at least one node portion j, and
(d) a step in which, with i∈˜V, j∈˜N(i), and b∈˜B, the node portion i that has received a dual auxiliary variable yj∈i,b(t+1) sets zi|j,b (t+1)=yj|i,b(t+1), and
performing machine learning wherein the machine learning is carried out even when the cost function is not a convex function, wherein
the primal variable is updated independently of the plurality of the node portions.