US 12,242,567 B2
Identifying salient features for generative networks
Willem Bastiaan Kleijn, Eastborne Wellington (NZ); Sze Chie Lim, San Francisco, CA (US); Michael Chinen, El Cerrito, CA (US); and Jan Skoglund, San Francisco, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Appl. No. 17/250,506
Filed by Google LLC, Mountain View, CA (US)
PCT Filed May 16, 2019, PCT No. PCT/US2019/032665
§ 371(c)(1), (2) Date Jan. 29, 2021,
PCT Pub. No. WO2020/231437, PCT Pub. Date Nov. 19, 2020.
Prior Publication US 2021/0287038 A1, Sep. 16, 2021
Int. Cl. G06N 3/045 (2023.01); G06F 18/2113 (2023.01); G06F 18/213 (2023.01); G06N 3/08 (2023.01); G06N 3/088 (2023.01)
CPC G06F 18/213 (2023.01) [G06F 18/2113 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 3/088 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A method for identifying features for a generative network, the method comprising:
obtaining a set of inputs for each clean input in a batch of inputs, the set of inputs including at least one modified input, each modified input being a different modified version of the clean input;
training an encoder having weights to provide features for an input by, for each set of inputs in the batch of inputs:
providing the set of inputs to one or more cloned encoders, each of the one or more cloned encoders sharing the weights and receiving a different respective input of the set of inputs, the encoder being one of the one or more cloned encoders, and
modifying the weights to minimize a global loss function, the global loss function having a first term that maximizes similarity between features for the set of inputs, a second term that maximizes independence and unit-variance within the features generated by the encoder, and a third term that minimizes a reconstruction loss with a target input, derived from the clean input, by mapping, via a decoder, the features generated by the encoder to an output representing a reconstruction of the input;
using the encoder to extract features for a new input as extracted features;
providing the extracted features to the generative network; and
generating audio data or images data using the generative network.