US 12,190,851 B2
Audio generation methods and systems
Adrian Barahona Rios, London (GB)
Assigned to Sony Interactive Entertainment Europe Limited, (GB)
Filed by Sony Interactive Entertainment Europe Limited, London (GB)
Filed on Jul. 15, 2022, as Appl. No. 17/865,869.
Claims priority of application No. 2110280 (GB), filed on Jul. 16, 2021.
Prior Publication US 2023/0018661 A1, Jan. 19, 2023
Int. Cl. G10H 1/00 (2006.01); A63F 13/54 (2014.01)
CPC G10H 1/0008 (2013.01) [A63F 13/54 (2014.09); G10H 2220/135 (2013.01); G10H 2250/235 (2013.01); G10H 2250/311 (2013.01)] 13 Claims
OG exemplary drawing
 
1. A method of generating audio assets, comprising the steps of:
receiving an input audio asset having a first duration,
generating an input image representative of the input audio asset,
training a generative model on the input image and implementing the trained generative model to generate an output image representative of an output audio asset having a second duration different to the first duration, and
generating the output audio asset based on the output image,
wherein the input image and output image each comprise an axis representative of time duration, and the step of generating an output image comprises retargeting the input image along the axis representative of time duration, and
wherein the output image has a larger dimension along the axis representative of time duration than the input image.
 
9. A non-transitory computer-readable medium having stored thereon a computer program comprising computer-implemented instructions that, when run on a computer, cause the computer to implement a method of generating audio assets, comprising the steps of:
receiving an input audio asset having a first duration,
generating an input image representative of the input audio asset,
training a generative model on the input image and implementing the trained generative model to generate an output image representative of an output audio asset having a second duration different to the first duration, and
generating the output audio asset based on the output image,
wherein the input image and output image each comprise an axis representative of time duration, and the step of generating an output image comprises retargeting the input image along the axis representative of time duration, and
wherein the output image has a larger dimension along the axis representative of time duration than the input image.
 
10. A system for generating audio assets, the system comprising:
an asset input unit configured to receive an input audio asset having a first duration, and to convert the input audio asset into an input image,
an image generation unit configured to implement a generative model to generate one or more output images based on the input image, the output image representing an output audio asset having a second duration different to the first duration, and
an asset output unit configured to generate an output audio asset based on the output image,
wherein the input image and output image each comprise an axis representative of time duration, and the step of generating an output image comprises retargeting the input image along the axis representative of time duration, and
wherein the output image has a larger dimension along the axis representative of time duration than the input image.