US 12,322,403 B2
	Speech coding method and apparatus, computer device, and storage medium
Junbin Liang, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on May 9, 2022, as Appl. No. 17/740,309.
Application 17/740,309 is a continuation of application No. PCT/CN2021/095714, filed on May 25, 2021.
Claims priority of application No. 202010585545.9 (CN), filed on Jun. 24, 2020.
Prior Publication US 2022/0270622 A1, Aug. 25, 2022
Int. Cl. G10L 19/02 (2013.01); G10L 25/78 (2013.01); G10L 25/90 (2013.01); G10L 25/93 (2013.01)

CPC G10L 19/02 (2013.01) [G10L 25/78 (2013.01); G10L 25/90 (2013.01); G10L 25/93 (2013.01)]

20 Claims

1. A speech coding method, executed by an electronic device, the method comprising:

obtaining a first to-be-encoded speech frame and a subsequent speech frame from an audio signal;

extracting a first speech frame feature corresponding to the first to-be-encoded speech frame, and calculating a first speech frame criticality level corresponding to the first to-be-encoded speech frame based on the first speech frame feature, wherein the first speech frame criticality level represents a level of contribution made by sound quality of the first speech frame to overall speech quality within a period that includes one or more speech frames before the first speech frame and one or more speech frames after the first speech frame;

extracting a second speech frame feature corresponding to the subsequent speech frame, and calculating a second speech frame criticality level corresponding to the subsequent speech frame based on the second speech frame feature, wherein the second speech frame criticality level represents a level of contribution made by sound quality of the second speech frame to the overall speech quality within a period that includes one or more speech frames before the second speech frame and one or more speech frames after the second speech frame;

obtaining a criticality trend feature based on the first speech frame criticality level and the second speech frame criticality level, and determining, using the criticality trend feature, an encoding bit rate corresponding to the first to-be-encoded speech frame, the encoding bit rate corresponding to each to-be-encoded speech frame being controlled adaptively based on criticality trend strength represented by the criticality trend feature; and

encoding the first to-be-encoded speech frame based on the encoding bit rate to obtain an encoding result of the audio signal.