CPC G10L 15/04 (2013.01) [G10L 15/08 (2013.01); G10L 25/51 (2013.01)] | 10 Claims |
1. A speech processing device comprising:
at least one memory storing instructions; and
at least one processor configured to execute the instructions stored in the memory to:
divide predetermined first speech into a plurality of first speech segments;
divide second speech in which a plurality of types of speech of multiple speakers are mixed into a plurality of second speech segments;
calculate first scores indicating similarities among the plurality of first speech segments, second scores indicating similarities among the plurality of second speech segments, and third scores indicating similarities between the plurality of first speech segments and the plurality of second speech segments;
calculate a threshold value based on the first scores indicating the similarities among the plurality of first speech segments;
classify the plurality of second speech segments into one or more clusters respectively having one or more similarities higher than a similarity indicated by the threshold value;
calculate whether speech corresponding to the first speech is contained in each of the one or more clusters; and
calculate a similarity between each of the one or more clusters and the first speech and determine based on a calculation result of whether the speech corresponding to the first speech is contained in each of the one or more clusters.
|