US 11,854,571 B2
	Method, device and electronic apparatus for transmitting and receiving speech signal
Liang Wen, Beijing (CN); Lizhong Wang, Beijing (CN); Anxi Yi, Beijing (CN); Chao Min, Beijing (CN); and Liangxi Yao, Beijing (CN)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Nov. 27, 2020, as Appl. No. 17/105,729.
Claims priority of application No. 201911196782.X (CN), filed on Nov. 29, 2019.
Prior Publication US 2021/0166717 A1, Jun. 3, 2021
Int. Cl. G10L 25/18 (2013.01); G10L 25/30 (2013.01); G10L 25/90 (2013.01)

CPC G10L 25/18 (2013.01) [G10L 25/30 (2013.01); G10L 25/90 (2013.01)]

16 Claims

1. A method of transmitting speech signal, the method comprising:

extracting low frequency feature information from an input speech signal by using a first feature extracting network, wherein the first feature extracting network comprises at least one first feature extracting module and at least one second feature extracting module arranged in a cascading manner, wherein each of the at least one first feature extracting module and the at least one second feature extracting module performs a multiscale convolution operation based on at least two convolution processing parameters, wherein the low frequency feature information extracted by the first feature extracting network comprises relevant information between high frequency features and low frequency features, and wherein each of the at least one first feature extracting module and the at least one second feature extracting module performs the multiscale convolution operation using one or more neural network models acquired by machine learning using high frequency features and low frequency features of speech, the one or more neural network being learned at high frequency sampling rates through convolution operations with different convolution kernel sizes covering different frequency resolutions;

fusing one or more low frequency feature information output by the first feature extracting network by using a first feature fusing network, to obtain a speech signal corresponding to the low frequency feature information, wherein the multiscale convolution operations are performed by the first feature extracting network to extract each of the one or more low frequency feature information using distinct parameters of the at least two convolution processing parameters; and

transmitting the speech signal corresponding to the low frequency feature information to a receiving end.