US 12,293,288 B2
Systems and methods of data preprocessing and augmentation for neural network climate forecasting models
Carlos Felipe Gaitan Ospina, San Diego, CA (US); and Maximilian Cody Evans, Oakland, CA (US)
Assigned to ClimateAI, Inc., San Francisco, CA (US)
Filed by ClimateAI, Inc., San Francisco, CA (US)
Filed on Dec. 21, 2022, as Appl. No. 18/085,803.
Application 18/085,803 is a continuation of application No. 16/878,552, filed on May 19, 2020, granted, now 11,537,889.
Claims priority of provisional application 62/850,454, filed on May 20, 2019.
Claims priority of provisional application 62/850,460, filed on May 20, 2019.
Prior Publication US 2023/0128989 A1, Apr. 27, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06N 3/044 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/044 (2023.01)] 18 Claims
OG exemplary drawing
 
1. A method for training a neural network (NN)-based climate forecasting model on a pre-processed multi-model ensemble of global climate simulation data from a plurality of global climate simulation models (GCMs), comprising:
determining a common spatial scale and a common temporal scale for the multi-model ensemble of global climate simulation data, wherein the multi-model ensemble of global climate simulation data comprises simulation data generated from at least two GCMs;
spatially re-gridding the multi-model ensemble of global climate simulation data to the common spatial scale;
temporally homogenizing the multi-model ensemble of global climate simulation data to the common temporal scale;
separating the spatially re-gridded, temporally homogenized multi-model ensemble of global climate simulation data into long-term signals or trends, global warming signals, seasonal signals, and other signal components caused by similar contributing climate factors;
generating synthetic simulation data through a climatology augmentation process by altering the long-term signal or trends within a feasible range;
augmenting the spatially re-gridded, temporally homogenized multi-model ensemble of global climate simulation data with the synthetic simulation data, to generate a spatially re-gridded, temporally homogenized, and augmented multi-model ensemble of global climate simulation data;
training, during a first training phase, the NN-based climate forecasting model using a first member time series of the spatially re-gridded, temporally homogenized, and augmented multi-model ensemble of global climate simulation data, wherein each input and corresponding desired output at a target lead time used in the first training phase are selected from the first member time series of the spatially re-gridded, temporally homogenized, and augmented multi-model ensemble of global climate simulation data; and
training, during a second training phase, the NN-based climate forecasting model using a second member time series of the spatially re-gridded, temporally homogenized, and augmented multi-model ensemble of global climate simulation data, wherein each input and corresponding desired output at the target lead time used in the second training phase are selected from the second member time series of the spatially re-gridded, temporally homogenized, and augmented multi-model ensemble of global climate simulation data;
wherein at least one training phase uses a member time series containing the synthetic simulation data.