| CPC G06N 5/043 (2013.01) [G06N 20/00 (2019.01)] | 16 Claims |

|
1. A method for training an artificial intelligent agent to recognize a goal configuration, comprising:
placing the agent in the goal configuration and identifying a resulting state as a positive example;
providing negative examples to the agent that demonstrate the agent in a state failing to achieve the goal configuration;
extracting key state features when the agent is in the goal configuration, the key state features including at least one of a room feature, object positioning, ambient lighting, and ambient sounds;
determining what feature categories are important in the goal configuration during receipt of positive examples to the agent;
learning and recognizing, by the agent, the goal configuration based on the extracted key state features and the determined important feature categories;
creating policies, by the agent, based on the learned goal configuration;
converting state features into a distance function to determine how far the agent is from the goal configuration;
using goal detection as a final reward; and
using a goal distance as an intermediate reward.
|