CPC G08B 21/0453 (2013.01) [G06F 3/011 (2013.01); G06F 3/017 (2013.01); G06N 5/045 (2013.01); G06N 20/00 (2019.01); G08B 21/0469 (2013.01); G08B 21/0476 (2013.01)] | 19 Claims |
1. A system for environmental state estimation and decision-making, comprising:
a plurality of agent devices, each with microprocessors with agent programs, said plurality of agent devices comprising a first agent device;
said first agent device comprising a plurality of internal sensors and a first agent microprocessor with a first agent program, wherein the first agent device is configured to send and receive electronic communications to and from a plurality of external sensors and said plurality of agent devices;
wherein the first agent device agent program is configured to:
predict the state of its environment at a time and location according to a first agent device internal model of the environment, wherein said environment comprises the first agent device and at least one other agent device from said plurality of agent devices;
selectively sample the state of its environment at the time and location, wherein said sampling comprise receiving observations about objects, actions, and events in said environment;
determine any environmental state prediction errors;
determine the source or sources for any environmental state prediction errors; and
update the first agent device internal model based upon the environmental state prediction errors and the determined source or sources of prediction errors;
further wherein the first agent device agent program selectively samples by receiving observations from multiple sensory modalities comprising at least its own internal sensors, one or more external sensors in said plurality of external sensors, and one or more other agent devices in said plurality of agent devices;
further wherein each sensory modality from said multiple sensory modalities comprises a perceptual pathway and a proprioceptive pathway implementing a perception-action loop and an action-perception loop;
further wherein perceptual patterns and proprioceptive patterns are completed after each observation;
further wherein prediction errors related to said perceptual patterns provide the observation for proprioception, whereby the first agent device agent program learns a policy without any reward-based or punishment-based reinforcement signal;
further wherein each agent device in said plurality of agent devices is an autonomous agent device able to independently initiate communications;
further wherein the first agent device is not in communication with a central controller, entity, or decision-maker;
further wherein there is no central bottleneck or decision-maker for processing data generated by the plurality of agent devices; and
further wherein each agent in said plurality of agent devices has knowledge of its local environment, the sum of each agent's knowledge collectively forming an environmental knowledge base, said environmental knowledge base distributed among the agents in said plurality of agent devices, wherein no agent stores the entire environmental knowledge base.
|