US 12,009,005 B2
	Method for rating the speech quality of a speech signal by way of a hearing device
Jana Thiemt, Erlangen (DE); and Marko Lugger, Weilersbach (DE)
Assigned to Sivantos Pte. Ltd., Singapore (SG)
Filed by SIVANTOS PTE. LTD., Singapore (SG)
Filed on Aug. 30, 2021, as Appl. No. 17/460,555.
Claims priority of application No. 10 2020 210 919.2 (DE), filed on Aug. 28, 2020.
Prior Publication US 2022/0068294 A1, Mar. 3, 2022
Int. Cl. G10L 21/0272 (2013.01); G10L 21/0364 (2013.01); G10L 25/60 (2013.01); G10L 25/84 (2013.01); H04R 25/00 (2006.01)

CPC G10L 21/0364 (2013.01) [G10L 25/60 (2013.01); G10L 25/84 (2013.01); H04R 25/405 (2013.01); H04R 25/407 (2013.01); H04R 25/43 (2013.01); H04R 25/505 (2013.01); H04R 2225/43 (2013.01)]

14 Claims

1. A method for rating a speech quality of a speech signal by a hearing device, the method comprising:

recording a sound with an acousto-electric input transducer of the hearing device, the sound containing the speech signal from surroundings of the hearing device, and converting the sound into an input audio signal;

quantitatively acquiring at least one articulatory property and/or prosodic feature of the speech signal through analysis of the input audio signal by a signal processing operation, and

deriving a quantitative measure of the speech quality based on the at least one articulatory property and/or prosodic feature; and

acquiring, as articulatory property of the speech signal, at least one of:

a characteristic variable correlated with the precision of predefined formants of vowels in the speech signal by,

ascertaining a signal component of the speech signal in at least one formant range in a frequency space,

ascertaining a signal variable correlated with a level for the signal component of the speech signal in the at least one formant range, and

ascertaining the characteristic variable based on a maximum value and/or based on a temporal stability of the signal variable correlated with the level;

a characteristic variable correlated with the dominance of consonants and/or fricatives in the speech signal by,

calculating a first energy contained in a low frequency range,

calculating a second energy contained in a frequency range higher than the low frequency range, and

forming the characteristic variable based on a ratio, and/or a ratio weighted over the respective bandwidths of the frequency ranges, of the first energy and the second energy; or

a characteristic variable correlated with the precision of transitions from voiced and unvoiced sounds by,

making a distinction between voiced temporal sequences and unvoiced temporal sequences based on a correlation measurement and/or based on a zero crossing rate,

ascertaining a transition from a voiced temporal sequence to an unvoiced temporal sequence or from an unvoiced temporal sequence to a voiced temporal sequence,

ascertaining the energy contained in the voiced or unvoiced temporal sequence prior to the transition for at least one frequency range, and ascertaining the energy contained in the unvoiced or voiced temporal sequence following the transition for the at least one frequency range, and

ascertaining the characteristic variable based on the energy prior to the transition and based on the energy following the transition.