US 12,228,941 B2
	Active scene mapping method based on constraint guidance and space optimization strategies
Xin Yang, Dalian (CN); Xuefeng Yin, Dalian (CN); Baocai Yin, Wuxi (CN); and Xiaopeng Wei, Dalian (CN)
Assigned to DALIAN UNIVERSITY OF TECHNOLOGY, Dalian (CN)
Filed by DALIAN UNIVERSITY OF TECHNOLOGY, Dalian (CN)
Filed on Sep. 11, 2023, as Appl. No. 18/464,917.
Prior Publication US 2024/0295879 A1, Sep. 5, 2024
Int. Cl. G05D 1/02 (2020.01); G05D 1/00 (2006.01)

CPC G05D 1/0274 (2013.01) [G05D 1/0088 (2013.01); G05D 1/0217 (2013.01)]

1 Claim

1. An active scene mapping method based on constraint guidance and space optimization strategies, comprising a global planning stage and a local planning stage; in the global planning stage, the next exploration goal of a robot is calculated to guide the robot to explore a scene; and when the next exploration goal is determined, specific actions are generated according to the next exploration goal, the position of the robot and the constructed occupancy map in the local planning stage to drive the robot to go to the next exploration goal, and observation data is collected to update the information of the occupancy map, specifically comprising the following steps:

step 1: generating a state according to the scanning data of the robot;

the state comprises three parts: an occupancy map M_t, a geodesic distance map D(M_t, ω_t) and a frontier-based entropy I(M_t); and ω_tindicates the position of the robot;

s(ω_t)=(M_t,D(M_t,ω_t),I(M_t)

1.1) Occupancy Map;

a 3D scene model is obtained by back projection of the depth with the pose of the robot according to the observation C(ω_t) of the robot in position ω_t; a 2D global map is constructed from the top-down view of the 3D scene model as an occupancy map M_t; at time t, the occupancy map is expressed as M_t∈[0,1]^X×Y×2, and X, Y represent the length of the occupancy map and the width of the occupancy map, respectively; the occupancy map comprises two channels indicating the explored and occupied regions, respectively; the grids in the occupancy map M_tare classified as follows: free (explored but not occupied), occupy (occupied) and unknown (not explored); and the frontier grid F_t⊂M_tis a free grid adjacent to an unknown grid;

1.2) geodesic distance map;

given the current position ω_tand the currently constructed occupancy map M_t, a geodesic distance map D(M_t, ω_t)∈ custom character

^X×Yis constructed, wherein D_x,y(M_t,ω_t) represents the geodesic distance from the position (x, y) to the position ω_tof the robot:

D_x,y(M_t,ω_t)=dist^M^_t((x,y),ω_t)

the geodesic distance dist^M^_tis the shortest distance for traversal between two points in the occupancy map M_t, and the geodesic distance dist^M^_tis calculated by the fast marching method;

1.3) frontier-based entropy;

the frontier-based entropy is introduced as constraints to reduce the searching space when the height of the occupancy map M_tis incomplete; and the frontier point f∈F_trepresents a next potential optimal exploration goal, a frontier-based entropy I is introduced for encoding the spatial distribution information of the frontier points based on the characteristics of small-range aggregation and large-range dispersion of frontier points, the encoded spatial distribution information is used as one of the inputs of the actor network in the global planning strategy Π to constrain action search, and the frontier-based entropy I is defined as follows:

wherein I_x,y(M_t) represents the number of frontier points within the γ×γ neighborhood centered on the position (x,y) of the frontier point in the occupancy map M_t; and the spatial distribution information contained in each frontier point includes the (x,y) coordinate of the position of the point and the statistical information of the spatial distribution of the frontier points within the neighborhood;

step 2: calculating the probability distribution of the action space of the robot according to the state input;

off-policy learning approach proximal policy optimization (PPO) is used as a policy optimizer for training optimization and decision execution of the global planning strategy; and the policy optimizer comprises an actor network and a critic network;

the actor network uses a multi-layer perceptron (MLP) as an encoder for feature extraction and uses a graph neural network for feature fusion; and a graph is constructed according to the frontier point given by the state s(ω_t), and feature extraction and feature fusion are carried out on the constructed graph to obtain the score of the frontier point;

the critic network comprises five convolutional layers, a flatten operation and three linear layers, a ReLu activation function is attached behind each convolutional layer and each linear layer, and the flatten operation is used for flattening multi-dimensional data into one-dimensional data; and the critic network is used for predicting the state value V(s(ω_t)) of the occupancy map to indicate the critic value obtained by the current state of the frontier point, and the critic value, as part of the loss function, is used for training the actor network;

the process of calculating the probability distribution of the action space according to the state input is specifically as follow:

a graph G(F_t,Ω_t) is constructed based on the frontier grid F_tand the exploration path Ω_t={ω₀, . . . , ω_t} to represent the context information of the current scene, a corresponding relation between the robot and the frontier point extracted from the occupancy map M_tis established in the graph G(F_t,Ω_t), and the information given by the state s(ω_t) is assigned to the node and edge of G(F_t,Ω_t); for each node n_i, the node input feature feat(n_i)∈ custom character

⁵includes: (x,y) coordinate information in the occupancy map M_t, semantic label information indicating n_i∈F_tor n_i∈Ω_t, historical label information indicating that n_iis the current node at or historical exploration node n_i∈{ω₀, . . . , ω_t−1}, and a frontier-based entropy I_n_{_i}(M_t); the node edge feature feat(l_ij)∈ custom character

³²is extracted by the multi-layer perceptron (MLP), wherein l_ij∈ custom character

¹represents the geodesic distance from node n_jto node n_i; the node input feature and the node edge feature are input into the actor network for feature extraction and feature fusion, and a set of scores of frontier points are output; and the sampling probability Π_mask(f|s(ω_t)) of each action of the robot is calculated based on the set of scores of frontier points;

step 3: carrying out action mask guided space alignment and selecting the next exploration goal;

the robot selects the frontier point with the highest score according to

as the next exploration goal ω_t+1, wherein H represents the global planning strategy to calculate the score of each frontier point in the current state; and the next exploration goal ω_t+1is to select a frontier point f from the frontier grid F_t;

based on the score of the frontier point, the action mask strategy is introduced to solve the misalignment problem of the space metrics; and the action mask strategy includes two action masks: a valid distance mask and a stuck mask, which are used for filtering actions in the action space of the global planning strategy Π and constraining action sampling to a valid action space;

3.1) valid distance mask;

the valid distance mask is used for filtering invalid goals in the action space of the global planning strategy; the action space is filtered according to the geodesic distance from the position ω_tof the robot to the next optimal exploration goal; the nearest threshold β_nearand the farthest threshold β_farare set; and for a next potential optimal exploration goal beyond the threshold range [β_near, β_far], the sampling probability is set to 0, and Π_mask(f|s(ω_t)) after the valid distance mask is as follows:

wherein Π_mask(f|s(ω_t)) represents the action mask probability of selecting the frontier point f as the next exploration goal, dist^M^_t(f, ω_t) is the geodesic distance from the position ω_tof the robot to the frontier point f, and Π(f|s(ω_t)) is original probability distribution from the encoder of the actor network;

3.2) stuck mask

the stuck mask is used for filtering out actions in the action space that cause the robot to get stuck; the max moving length

and the max area increment

of the robot in the last three global planning stages are calculated, wherein c(M_t) represents the area of the scanned region at time t calculated according to the occupancy map M_t, and when the moving length l_maxand the area increment c_maxare greater than the set thresholds, the action is considered reasonable; otherwise, the probability value corresponding to the action is set to 0, that is Π_mask(a|s(ω_t))=0;

step 4: planning a path to the next exploration goal;

the robot uses the fast marching method (FMM) to plan a moving path according to the position ω_tof the robot, the next exploration goal ω_t+1obtained by calculation and the constructed occupancy map M_tto drive the robot to go to the next exploration goal, and observation data scanned by the robot during movement is collected to update the information of the map;

step 5: making a judgment about the termination of exploration;

repeating steps 1-4 to judge whether the exploration meets the termination conditions, and terminating the exploration when the scanning coverage of the robot exceeds the set threshold or the number of exploration steps of the robot exceeds the maximum set number of steps.