US 12,256,169 B2
Apparatus and method for video-audio processing, and program for separating an object sound corresponding to a selected video object
Hiroyuki Honma, Chiba (JP); and Yuki Yamamoto, Tokyo (JP)
Assigned to Sony Group Corporation, Tokyo (JP)
Filed by Sony Group Corporation, Tokyo (JP)
Filed on Jan. 9, 2024, as Appl. No. 18/407,825.
Application 18/407,825 is a continuation of application No. 17/527,578, filed on Nov. 16, 2021, granted, now 11,902,704.
Application 17/527,578 is a continuation of application No. 16/303,331, granted, now 11,184,579, previously published as PCT/JP2017/018499, filed on May 17, 2017.
Claims priority of application No. 2016-107042 (JP), filed on May 30, 2016.
Prior Publication US 2024/0146867 A1, May 2, 2024
Int. Cl. H04N 5/92 (2006.01); G06V 20/40 (2022.01); G06V 40/16 (2022.01); G10L 19/00 (2013.01); G10L 19/008 (2013.01); G10L 21/0272 (2013.01); G11B 27/30 (2006.01); H04N 9/802 (2006.01); H04N 19/46 (2014.01); H04R 1/40 (2006.01); H04R 3/00 (2006.01)
CPC H04N 5/9202 (2013.01) [G06V 20/46 (2022.01); G06V 40/16 (2022.01); G06V 40/161 (2022.01); G10L 19/00 (2013.01); G10L 19/008 (2013.01); G10L 21/0272 (2013.01); G11B 27/3081 (2013.01); H04N 9/802 (2013.01); H04N 19/46 (2014.11); H04R 1/40 (2013.01); H04R 3/00 (2013.01); G06F 2218/22 (2023.01)] 15 Claims
OG exemplary drawing
 
1. A video-audio processing apparatus, comprising:
processing circuitry and a memory containing instructions that, when executed by the processing circuitry, are configured to:
cause one or more video objects, based on a video signal, to be displayed in an image;
select a video object from the one or more video objects;
extract an audio object signal of the selected video object from an audio signal; and
produce metadata of the selected video object, the metadata including spread information indicating a spatial spread of an area of the selected video object, wherein the audio object signal of the selected video object is reproduced based on the spatial spread of the selected video object,
wherein the spread information is produced based on a frame image surrounding the selected video object, and
wherein the spread information is produced based on an angle between a first vector from an origin to a center of the frame image and a second vector from the origin to a side of the frame image.