US 12,422,859 B2
	Brain-like memory-based environment perception and decision-making method and system for unmanned surface vehicle
Shaorong Xie, Shanghai (CN); Hang Yu, Shanghai (CN); and Xiangfeng Luo, Shanghai (CN)
Assigned to SHANGHAI UNIVERSITY, Shanghai (CN)
Filed by SHANGHAI UNIVERSITY, Shanghai (CN)
Filed on Sep. 6, 2023, as Appl. No. 18/242,847.
Prior Publication US 2024/0402705 A1, Dec. 5, 2024
Int. Cl. G05D 1/243 (2024.01); G05D 101/15 (2024.01); G05D 109/30 (2024.01); G06N 3/092 (2023.01); G06V 10/40 (2022.01); G06V 20/70 (2022.01)

CPC G05D 1/243 (2024.01) [G06N 3/092 (2023.01); G06V 10/40 (2022.01); G06V 20/70 (2022.01); G05D 2101/15 (2024.01); G05D 2109/34 (2024.01)]

8 Claims

1. A brain-like memory-based environment perception and decision-making method for an unmanned surface vehicle, comprising:

obtaining an image of an environment in front of an unmanned surface vehicle;

inputting the image of the environment into an environment perception and decision-making model of the unmanned surface vehicle, and outputting an action instruction, wherein the environment perception and decision-making model of the unmanned surface vehicle comprises an image feature extractor, a Bidirectional Encoder Representations from Transformers (BERT) model, a fully connected layer, a short-term scene memory module, and a long-term memory module that are connected in turn; and

using the action instruction to control the unmanned surface vehicle to perform an action; wherein

the image feature extractor is configured to extract an image feature from the image of the environment; the BERT model is configured to extract an image feature representation containing a text feature from the image feature; the fully connected layer is configured to map the image feature representation onto an image query suitable for recognition by a large language model; the short-term scene memory module is configured to preset a plurality of questions, and use a short-term scene memory of the large language model to answer the plurality of questions in a specified order to obtain a plurality of answers; the long-term memory module is configured to use a long-term memory and in-context learning of the large language model to output the action instruction based on the plurality of answers; and the large language model is a large language model obtained after fine tuning based on reinforcement learning.