US 12,315,527 B2
	Method and system for speech recognition
Shiliang Zhang, Hangzhou (CN); and Ming Lei, Hangzhou (CN)
Assigned to ALIBABA GROUP HOLDING LIMITED, Grand Cayman (KY)
Appl. No. 17/428,015
Filed by ALIBABA GROUP HOLDING LIMITED, Grand Cayman (KY)
PCT Filed Feb. 3, 2020, PCT No. PCT/CN2020/074178 § 371(c)(1), (2) Date Aug. 3, 2021, PCT Pub. No. WO2020/164397, PCT Pub. Date Aug. 20, 2020.
Claims priority of application No. 201910111593.1 (CN), filed on Feb. 12, 2019.
Prior Publication US 2022/0028404 A1, Jan. 27, 2022
Int. Cl. G10L 21/0216 (2013.01); G10L 15/06 (2013.01); H04R 3/00 (2006.01); H04R 3/04 (2006.01)

CPC G10L 21/0216 (2013.01) [G10L 15/063 (2013.01); H04R 3/005 (2013.01); H04R 3/04 (2013.01); G10L 2021/02166 (2013.01)]

16 Claims

1. A method comprising:

allocating a signal source based on different directions of arrival (DOAs) by dividing a physical space into a plurality of non-overlapping regions to allocate the signal source into the plurality of regions, the allocation performed prior to beamforming, the plurality of non-overlapping regions based on preset DOA angles comprising at least two of: an angle of 30 degrees, an angle of 60 degrees, an angle of 90 degrees, an angle of 120 degrees, and an angle of 150 degrees;

enhancing signals of the signal source for each of the regions to obtain enhanced signals corresponding to the regions, the enhancing performed independently for each region;

performing speech recognition on the enhanced signals corresponding to the regions to obtain recognition results corresponding to the regions;

providing the recognition results corresponding to the regions to respective acoustic models, each acoustic model trained for its corresponding region based on enhanced signal samples from that region; and

fusing outputs of the acoustic models to obtain a recognition result, wherein fusing analyzes outputs from all preset regions regardless of estimated signal source direction.