| CPC G06N 3/092 (2023.01) [G06V 10/774 (2022.01)] | 12 Claims |

|
7. A reinforcement learning apparatus using a task decomposition inference model in a time-variant environment, comprising:
a transition model unit which selects a plurality of paired transitions having a time-invariant common characteristic and a time-variant different environmental characteristic from the dataset including a plurality of transition data based on the cycle GAN;
an embedding unit which trains an auto encoder to embed each of the time-variant part and the time-invariant part with respect to the plurality of paired transitions into a latent space; and
a reinforcement learning unit which performs reinforcement learning on a transition corresponding to data collected in the time-variant environment, using the trained auto encoder.
|