US 11,657,833 B2
Classifying audio scene using synthetic image features
Eric Chris Wolfgang Sommerlade, Oxford (GB); Yang Liu, Reading (GB); Alexandros Neofytou, London (GB); and Sunando Sengupta, Reading (GB)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Oct. 26, 2021, as Appl. No. 17/452,306.
Application 17/452,306 is a continuation of application No. 16/844,930, filed on Apr. 9, 2020, granted, now 11,164,042.
Claims priority of provisional application 62/961,049, filed on Jan. 14, 2020.
Prior Publication US 2022/0044071 A1, Feb. 10, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. H04N 5/272 (2006.01); H04N 7/14 (2006.01); G06F 18/214 (2023.01); G06F 18/241 (2023.01); G06V 10/764 (2022.01); G10L 25/51 (2013.01); G06V 10/82 (2022.01); G06V 10/44 (2022.01); G06V 20/00 (2022.01)
CPC G10L 25/51 (2013.01) [G06F 18/214 (2023.01); G06F 18/241 (2023.01); G06V 10/454 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/00 (2022.01); H04N 5/272 (2013.01); H04N 7/141 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computing system comprising:
a processor having associated memory storing:
a discriminator configured to determine whether a target feature is real or synthetic;
a generator having been trained on an audio-visual pair of image data and first audio data with the discriminator;
a classifier having been trained on second audio data that is not paired with an image, the first audio data and the second audio data being recordings generated at substantially different geographical locations; and
instructions that cause the processor to execute, at runtime:
the generator configured to generate synthetic image features from third audio data; and
the classifier configured to classify a scene of the third audio data based on the synthetic image features.