US 12,469,516 B2
	Detection and enhancement of speech in binaural recordings
Giulio Cengarle, Barcelona (ES); and Yuanxing Ma, Beijing (CN)
Assigned to DOLBY LABORATORIES LICENSING CORPORATION, San Francisco, CA (US); and DOLBY INTERNATIONAL AB, Dublin (IE)
Appl. No. 18/327,671
Filed by DOLBY LABORATORIES LICENSING CORPORATION, San Francisco, CA (US); and DOLBY INTERNATIONAL AB, Dublin (IE)
PCT Filed Jan. 12, 2022, PCT No. PCT/US2022/012128 § 371(c)(1), (2) Date Jun. 1, 2023, PCT Pub. No. WO2022/155205, PCT Pub. Date Jul. 21, 2022.
Claims priority of provisional application 63/245,548, filed on Sep. 17, 2021.
Claims priority of provisional application 63/162,289, filed on Mar. 17, 2021.
Claims priority of application No. ES202130013 (ES), filed on Jan. 12, 2021.
Prior Publication US 2025/0078858 A1, Mar. 6, 2025
Int. Cl. G10L 21/0364 (2013.01); G10L 21/034 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01)

CPC G10L 21/0364 (2013.01) [G10L 21/034 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01)]

15 Claims

1. A method comprising:

dividing a binaural speech signal into frames;

applying a time-frequency transform to each frame;

computing features of the frames based on a time-frequency representation;

classifying, by a classifier, each frame as self speech or external speech, based at least in part on a subset of features;

computing a dissimilarity function based on a subset of features;

segmenting the signal at peaks of the dissimilarity function;

for each segment, determining a respective overall class among self speech or external speech by aggregating classifier data of the frames belonging to the segment; and

processing each segment with a speech enhancement chain whose settings are based on determined overall class for such segment.