US 11,887,616 B2
	Audio processing
Sujeet Shyamsundar Mate, Tampere (FI); Jussi Artturi Leppänen, Tampere (FI); Miikka Tapani Vilermo, Siuro (FI); and Arto Lehtiniemi, Lempäälä (FI)
Assigned to Nokia Technologies Oy, Espoo (FI)
Appl. No. 17/418,652
Filed by Nokia Technologies Oy, Espoo (FI)
PCT Filed Jan. 7, 2020, PCT No. PCT/EP2020/050182 § 371(c)(1), (2) Date Jun. 25, 2021, PCT Pub. No. WO2020/148109, PCT Pub. Date Jul. 23, 2020.
Claims priority of application No. 19151807 (EP), filed on Jan. 15, 2019.
Prior Publication US 2022/0068290 A1, Mar. 3, 2022
Int. Cl. G10L 21/0208 (2013.01); H04N 21/485 (2011.01); H04N 21/81 (2011.01); G10L 19/012 (2013.01)

CPC G10L 21/0208 (2013.01) [H04N 21/4852 (2013.01); H04N 21/8106 (2013.01)]

17 Claims

1. An apparatus comprising:

at least one processor; and

at least one non-transitory memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least the following:

receive multimedia data representing a scene, the multimedia data comprising at least audio data representing an audio component of the scene;

determine at least one location of unwanted sound in the scene, wherein the at least one determined location comprises one or more spatial locations in the scene where the unwanted sound is present;

perform first audio processing to remove at least part of the unwanted sound from the at least one determined location;

perform second audio processing to add artificial sound associated to the unwanted sound at or proximate the at least one determined location;

identify one or more regions of interest based on object classification; and

determine whether the at least one determined location of unwanted sound corresponds with the one or more regions of interest, wherein a correspondence between the at least one determined location of unwanted sound and the one or more regions of interest affects:

an amount of the unwanted sound removed via the first audio processing, and

an amount of the artificial sound added via the second audio processing.