US 12,126,977 B1
	Systems and methods for dynamically modifying audio content using variable field of view
Eric Steven Penrod, Brentwood, CA (US); Erich Tisch, San Francisco, CA (US); and Timothy Dick, San Francisco, CA (US)
Assigned to GoPro, Inc., San Mateo, CA (US)
Filed by GoPro, Inc., San Mateo, CA (US)
Filed on Aug. 24, 2022, as Appl. No. 17/894,805.
Claims priority of provisional application 63/239,068, filed on Aug. 31, 2021.
Int. Cl. H04R 5/04 (2006.01); G06V 20/50 (2022.01); H04R 5/027 (2006.01)

CPC H04R 5/04 (2013.01) [G06V 20/50 (2022.01); H04R 5/027 (2013.01)]

20 Claims

1. A system for dynamically modifying audio content using variable field of view, the system comprising:

one or more physical processors configured by machine-readable instructions to:

obtain visual information, the visual information defining visual content captured by an image sensor of an image capture device during a capture duration, the visual content having a progress length, the visual content having a field of view based on capture through an optical element of the image capture device, wherein information on the capture of the visual content through the optical element of the image capture device is stored as metadata of the visual content, further wherein edit information for the visual content is stored as additional metadata of the visual content, the edit information including information on changes in size and/or location of a punchout of the visual content during the progress length of the visual content, the punchout of the visual content including an extent of the visual content for viewing or extraction, the punchout of the visual content being smaller than the field of view of the visual content;

obtain audio information, the audio information defining multiple audio content captured by multiple sound sensors of the image capture device during the capture duration, the multiple audio content including first audio content captured by a first sound sensor of the image capture device, second audio content captured by a second sound sensor of the image capture device, and third audio content captured by a third sound sensor of the image capture device; and

generate modified audio content from the multiple audio content based on the information on the changes in the size and/or the location of the punchout of the visual content stored as the additional metadata of the visual content to match the changes in the size and/or the location of the punchout of the visual content, wherein the modified audio content provides sound for playback of the punchout of the visual content.