US 12,014,257 B2
Domain specific language for generation of recurrent neural network architectures
Stephen Joseph Merity, San Francisco, CA (US); Richard Socher, Menlo Park, CA (US); James Bradbury, San Francisco, CA (US); and Caiming Xiong, Palo Alto, CA (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on Apr. 13, 2018, as Appl. No. 15/953,265.
Claims priority of provisional application 62/578,371, filed on Oct. 27, 2017.
Claims priority of provisional application 62/508,984, filed on May 19, 2017.
Prior Publication US 2018/0336453 A1, Nov. 22, 2018
Int. Cl. G06N 3/045 (2023.01); G06N 3/044 (2023.01); G06N 3/048 (2023.01); G06N 3/082 (2023.01); G06N 5/01 (2023.01)
CPC G06N 3/045 (2023.01) [G06N 3/044 (2023.01); G06N 3/082 (2013.01); G06N 5/01 (2023.01); G06N 3/048 (2023.01)] 26 Claims
OG exemplary drawing
 
1. A method comprising:
generating a candidate recurrent neural network (RNN) architecture having a representation of a domain specific language (DSL), wherein the representation of the candidate RNN architecture comprises one or more operators of the DSL, the generating of the candidate RNN architecture comprising:
starting from a hidden state node of a current time step of the candidate RNN architecture to a hidden state node of a previous time step of the candidate RNN architecture, adding an empty node to the hidden state node of the current time step as a placeholder node;
initializing a partial RNN architecture including the hidden state node of the current time step and the empty node;
replacing, by an architecture generator neural network, the placeholder node with an operator node corresponding to one of the one or more operators of the DSL;
adding another empty node to the operator node as another placeholder; and
ranking the candidate RNN architecture, the ranking comprising:
providing an encoding of the candidate RNN architecture as input to an architecture ranking neural network configured to determine a score for the candidate RNN architecture, the score representing a performance of the candidate RNN architecture for a particular type of task, the performance representing an aggregate measure of an accuracy of the candidate RNN architecture, and wherein the architecture ranking neural network is trained based on a training data set including previous RNN architectures and known performance scores for the previous RNN architectures, and
executing the architecture ranking neural network to generate the score of the candidate RNN architecture;
comparing the score of the candidate RNN architecture with a score of another candidate RNN architecture;
selecting one of the candidate RNN architecture and the other candidate RNN architecture based on respective scores; and
compiling the selected candidate RNN architecture to generate a target RNN.