CPC G10L 15/1815 (2013.01) [G06F 40/166 (2020.01); G06F 40/30 (2020.01); G10L 15/22 (2013.01); G10L 2015/223 (2013.01)] | 16 Claims |
1. A method, performed by an electronic device, of correcting a speech input, the method comprising:
receiving a first speech signal;
obtaining first text by converting the first speech signal to text;
obtaining an intent of the first speech signal and a confidence score of the intent, by inputting the first text to a natural language understanding model;
identifying a plurality of correction candidate semantic elements capable of being correction targets in the first text, by inputting the first text, the intent, and the confidence score of the intent to an artificial intelligence model;
comparing the confidence score of the intent with a first threshold value:
determining a correction priority of the plurality of correction candidate semantic elements, based on a result of the comparing the confidence score with the first threshold value;
receiving a second speech signal;
obtaining second text by converting the second speech signal to text;
identifying whether the second speech signal is a speech signal for correcting the first text, by analyzing the second text;
based on identifying the second speech signal is the speech signal for correcting the first text, comparing the plurality of correction candidate semantic elements in the first text with a semantic element in the second text, based on the confidence score; and
correcting at least one of the plurality of correction candidate semantic elements in the first text, based on a result of the comparing the plurality of correction candidate semantic elements in the first text with the semantic element in the second text,
wherein the first threshold value is determined based on another confidence score of speech signal input to the electronic device before the first speech signal corresponding to the first text is received and
wherein an intent of the speech signal input to the electronic device before the first speech signal is received is of a same domain as the intent of the first speech signal.
|