US 12,437,215 B2
Device, method, and computer program product for executing inference using input signal
Tenta Sasaya, Ota (JP); Takashi Watanabe, Yokohama (JP); and Toshiyuki Ono, Kawasaki (JP)
Assigned to KABUSHIKI KAISHA TOSHIBA, Minato-ku (JP); and TOSHIBA INFRASTRUCTURE SYSTEMS & SOLUTIONS CORPORATION, Kawasaki (JP)
Filed by KABUSHIKI KAISHA TOSHIBA, Minato-ku (JP); and TOSHIBA INFRASTRUCTURE SYSTEMS & SOLUTIONS CORPORATION, Kawasaki (JP)
Filed on Jul. 30, 2020, as Appl. No. 16/942,906.
Claims priority of application No. 2020-011348 (JP), filed on Jan. 28, 2020.
Prior Publication US 2021/0232947 A1, Jul. 29, 2021
Int. Cl. G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06N 5/04 (2013.01) [G06N 20/00 (2019.01)] 10 Claims
OG exemplary drawing
 
1. A signal processing device, comprising:
one or more processors configured to:
acquire, from a plurality of outputs, each of which is an output of a plurality of first learning models for learning, a feature of a first signal, that is an audio signal of an utterance of a speaker used for inference, by applying an input signal to each of the first learning models for learning such that the acquired feature is outputted upon inputting the input signal, the input signal being a third signal or a fourth signal, the third signal including the first signal and a second signal that is a signal unnecessary for the inference, the fourth signal being obtained by converting the third signal, wherein the acquired feature is frequency information representing respective frequencies of a plurality of signals contained in the first signal;
display, on a display, the acquired feature by using the plurality of first learning models and the identity of the word spoken by the speaker, wherein the display comprises an interactive slide bar for designating one or more weighting factors that are used to add together, based on the one or more weighting factors, a plurality of respective features obtained based on each of the plurality of first learning models and output the addition result as the acquired feature of the first signal, and wherein designating the one or more weighting factors updates the acquired feature displayed on the display;
execute inference by using a second learning model for learning such that an inference result is outputted upon inputting the acquired feature, the inference result being an indication of an identity of a word spoken by the speaker;
calculate both a first error value and a second error value, the first error value constituting an error value between a first correct answer signal representing a correct answer of the acquired feature and the acquired feature, the second error value constituting an error value between a second correct answer signal representing a correct answer of inference based on the acquired feature and the outputted inference result; and
execute a training process by executing both (1) first learning processing to update a parameter of each of the plurality of first learning models based on both the first error value and the second error value, and (2) second learning processing to update a parameter of the second learning model based on the second error value, wherein the first learning processing to update the parameter of each of the plurality of first learning models includes:
performing a multiplying process that multiplies the first error value by a first adjustment factor and multiplies the second error value by a second adjustment factor;
updating the parameter of the one or more first learning models based on a sum of the first error value and the second error value after the multiplying process; and
modifying a value of the first adjustment factor and a value of the second adjustment factor such that, as a number of times of updating the parameter of the one or more first learning models increases, the first error value after being multiplied by the first adjustment factor is reduced and the second error value after being multiplied by the second adjustment factor is increased with the number of times of updating.