US 11,727,949 B2
Methods and apparatus for reducing stuttering
Rebecca Kleinberger, Cambridge, MA (US); Michael Erkkinen, Cambridge, MA (US); George Stefanakis, Jamaica, NY (US); Akito van Troyer, Cambridge, MA (US); Satrajit Ghosh, Wayland, MA (US); Janet Baker, West Newton, MA (US); and Tod Machover, Waltham, MA (US)
Assigned to Massachusetts Institute of Technology, Cambridge, MA (US)
Filed by Massachusetts Institute of Technology, Cambridge, MA (US); and The Brigham and Women's Hospital, Inc., Boston, MA (US)
Filed on Aug. 3, 2020, as Appl. No. 16/983,974.
Claims priority of provisional application 62/885,316, filed on Aug. 12, 2019.
Prior Publication US 2021/0050029 A1, Feb. 18, 2021
Int. Cl. G06F 15/00 (2006.01); G10L 25/00 (2013.01); G10L 21/013 (2013.01); G10L 21/034 (2013.01); G06F 3/16 (2006.01); H04R 3/04 (2006.01); G10L 17/00 (2013.01); G10L 21/003 (2013.01); G10L 21/02 (2013.01)
CPC G10L 21/013 (2013.01) [G06F 3/16 (2013.01); G10L 17/00 (2013.01); G10L 21/003 (2013.01); G10L 21/02 (2013.01); G10L 21/034 (2013.01); H04R 3/04 (2013.01)] 24 Claims
OG exemplary drawing
 
22. A method comprising:
converting a sound of a voice of a user into an electrical audio signal;
transforming the electrical audio signal to produce a transformed electrical audio signal;
converting, with one or more speakers, the transformed electrical audio signal into a transformed sound of the voice of the user in real time while the user is making the sound,
wherein the transformed sound is outputted in real time by the one or more speakers in such a way as to be audible to only the user to reduce stuttering by the user, performing a speaker identification algorithm to determine whether a voice is the voice of the user; and
repeatedly sampling fundamental frequency of the voice of the user during the transforming, which fundamental frequency changes over time during the transforming,
wherein the transforming is performed only for time intervals in which the user is speaking, the transforming is performed in each of a set of modes, wherein the set of modes includes:
a first mode, in which the transforming causes the transformed sound of the voice of the user to have a whispered sound effect,
a second mode, in which the transforming causes the transformed sound of the voice of the user to have a reverberation sound effect,
a third mode, in which transforming causes the transformed sound to comprise a superposition of the voice of the user and one or more pitch-shifted versions of the voice of the user that are sounded simultaneously with the voice of the user, each of the one or more pitch-shifted versions being shifted in pitch, relative to the voice of the user, by a frequency interval that occurs between notes of a chord in a chromatic musical scale,
a fourth mode, in which the transforming causes the transformed sound to comprise a superposition of the voice of the user and one or more pitch-shifted versions of the voice of the user that are sounded simultaneously with the voice of the user, each of the one or more pitch-shifted versions being shifted in pitch, relative to the voice of the user, by a frequency interval that occurs between notes of a chord in a chromatic musical scale, and
a fifth mode; and
wherein the transforming causes the transformed sound to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the voice of the user, which pitch-shifted versions are sounded simultaneously with each other, in such a way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the voice of the user, the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the voice of the user as most recently sampled has changed, the chord remains constant between each temporally adjoining pair of pseudobeats, and each pseudobeat in the set, except an initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the voice of the user occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set, wherein each mode in the set occurs during a time period in which no other mode in the set occurs;
taking measurements of the stuttering by the user during each of multiple time windows in which the transforming occurs;
detecting, based on the measurements, an increase in the stuttering; and
in response to the detecting of the increase in the stuttering, changing which mode of transforming is occurring, by changing from one mode in the set to another mode in the set to reduce the stuttering by the user.