US 11,875,813 B2
Systems and methods for brain-informed speech separation
Nima Mesgarani, New York, NY (US); Enea Ceolini, New York, NY (US); and Cong Han, New York, NY (US)
Assigned to The Trustees of Columbia University in the City of New York, New York, NY (US)
Filed by The Trustees of Columbia University in the City of New York, New York, NY (US)
Filed on Mar. 31, 2023, as Appl. No. 18/129,469.
Application 18/129,469 is a continuation of application No. PCT/US2021/053560, filed on Oct. 5, 2021.
Claims priority of provisional application 63/087,636, filed on Oct. 5, 2020.
Prior Publication US 2023/0377595 A1, Nov. 23, 2023
Int. Cl. G10L 21/028 (2013.01); G10L 21/0232 (2013.01); G10L 21/0208 (2013.01)
CPC G10L 21/028 (2013.01) [G10L 21/0232 (2013.01); G10L 2021/02087 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for speech separation comprising:
obtaining, by a device, a combined sound signal for signals combined from multiple sound sources in an area in which a person is located;
obtaining, by the device, neural signals for the person, the neural signals being indicative of one or more target sound sources, from the multiple sound sources, the person is attentive to;
determining a separation filter based, at least in part, on the neural signals obtained for the person; and
applying, by the device, the separation filter to a representation of the combined sound signal to derive a resultant separated signal representation associated with sound from the one or more target sound sources the person is attentive to;
wherein determining the separation filter comprises deriving, using a trained learning model, a time-frequency mask that is applied to a time-frequency representation of the combined sound signal, including deriving the time-frequency mask based on a representation of an estimated target envelope for the one or more target sound sources the person is attentive to, determined based on the neural signals obtained for the person, and based on a representation for the combined sound signal.