US 12,469,510 B2
Transforming speech signals to attenuate speech of competing individuals and other noise
Kamil Krzysztof Wojcicki, Kangaroo Point (AU); Xuehong Mao, San Jose, CA (US); David Guoqing Zhang, Fremont, CA (US); Samer Hijazi, San Jose, CA (US); and Raul Alejandro Casas, Doylestown, PA (US)
Assigned to CISCO TECHNOLOGY, INC., San Jose, CA (US)
Filed by Cisco Technology, Inc., San Jose, CA (US)
Filed on Nov. 16, 2022, as Appl. No. 17/988,376.
Prior Publication US 2024/0161765 A1, May 16, 2024
Int. Cl. G10L 21/0208 (2013.01); G06N 3/0464 (2023.01); G06N 3/0499 (2023.01); G06N 3/084 (2023.01); G06N 20/00 (2019.01); G10L 25/78 (2013.01)
CPC G10L 21/0208 (2013.01) [G06N 3/084 (2013.01); G06N 20/00 (2019.01); G10L 25/78 (2013.01); G06N 3/0464 (2023.01); G06N 3/0499 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving speech signals from a user during a communication session, wherein the speech signals contain noise including speech of other individuals;
transforming the speech signals by a machine learning model to produce transformed speech signals corresponding to the speech signals with a reduced amount of the noise;
determining a difference between the transformed speech signals and the speech signals, wherein the difference corresponds to a signal to noise ratio for the speech signals;
collecting the speech signals as clean speech of the user based on the difference satisfying a noise threshold for the signal to noise ratio, wherein the clean speech includes the speech signals with the difference corresponding to the signal to noise ratio of at least a predetermined decibel level;
generating an updated version of the machine learning model by training the machine learning model based on the clean speech; and
replacing the machine learning model with the updated version of the machine learning model when performance of the updated version of the machine learning model exceeds performance of the machine learning model.