US 12,469,516 B2
Detection and enhancement of speech in binaural recordings
Giulio Cengarle, Barcelona (ES); and Yuanxing Ma, Beijing (CN)
Assigned to DOLBY LABORATORIES LICENSING CORPORATION, San Francisco, CA (US); and DOLBY INTERNATIONAL AB, Dublin (IE)
Appl. No. 18/327,671
Filed by DOLBY LABORATORIES LICENSING CORPORATION, San Francisco, CA (US); and DOLBY INTERNATIONAL AB, Dublin (IE)
PCT Filed Jan. 12, 2022, PCT No. PCT/US2022/012128
§ 371(c)(1), (2) Date Jun. 1, 2023,
PCT Pub. No. WO2022/155205, PCT Pub. Date Jul. 21, 2022.
Claims priority of provisional application 63/245,548, filed on Sep. 17, 2021.
Claims priority of provisional application 63/162,289, filed on Mar. 17, 2021.
Claims priority of application No. ES202130013 (ES), filed on Jan. 12, 2021.
Prior Publication US 2025/0078858 A1, Mar. 6, 2025
Int. Cl. G10L 21/0364 (2013.01); G10L 21/034 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01)
CPC G10L 21/0364 (2013.01) [G10L 21/034 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A method comprising:
dividing a binaural speech signal into frames;
applying a time-frequency transform to each frame;
computing features of the frames based on a time-frequency representation;
classifying, by a classifier, each frame as self speech or external speech, based at least in part on a subset of features;
computing a dissimilarity function based on a subset of features;
segmenting the signal at peaks of the dissimilarity function;
for each segment, determining a respective overall class among self speech or external speech by aggregating classifier data of the frames belonging to the segment; and
processing each segment with a speech enhancement chain whose settings are based on determined overall class for such segment.