| CPC G06N 3/044 (2023.01) [G06N 3/08 (2013.01)] | 21 Claims | 

| 
               1. A method implemented by one or more processors, the method comprising: 
            jointly generating a first output stream sequence and a second output stream sequence, using a multi-stream recurrent neural network transducer (MS RNN-T), wherein the MS RNN-T comprises an input stream encoder, a first output stream encoder, a second output stream encoder, and a joint network, wherein jointly generating the first output stream sequence and the second output stream sequence, using the MS RNN-T comprises: 
              initializing an input stream sequence using an initial segment in a sequence of segments, wherein the input stream sequence is based on user interface input of at least one user of a computing device; 
                  initializing the first output stream sequence as empty; 
                  initializing the second output stream sequence as empty; 
                  for each of the segments, in the sequence, and until one or more conditions are satisfied: 
                  generating an encoded representation of the input stream sequence by processing the input stream sequence using the input stream encoder; 
                    generating an encoded representation of the first output stream sequence by processing the first output stream sequence using the first output stream encoder; 
                    generating an encoded representation of the second output stream sequence by processing the second output stream sequence using the second output stream encoder; 
                    generating predicted output by processing (1) the encoded representation of the input stream sequence, (2) the encoded representation of the first output stream sequence, and (3) the encoded representation of the second output stream sequence, using the joint network; 
                    determining whether the predicted output corresponds to the first output stream sequence or the second output stream sequence; 
                    if the predicted output corresponds to the first output stream sequence, updating the first output stream sequence based on the predicted output; 
                    if the predicted output corresponds to the second output stream sequence, updating the second output stream sequence based on the predicted output; and 
                    updating the input stream sequence based on the next segment in the sequence of the segments 
                  generating a response to the user interface input based on the first output stream and/or the second output stream; and 
                  causing the computing device to render the response to the at least one user. 
                 |