| CPC G06Q 10/06313 (2013.01) [B25J 9/1656 (2013.01); G05B 2219/32335 (2013.01); G05B 2219/39001 (2013.01)] | 13 Claims |

|
1. A computer-implemented method to efficiently coordinate multi-robot tasks performed by a plurality of heterogeneous robot equipment of different types, the method comprising:
collecting, by a hardware scheduler, at each of a plurality of schedule-able time steps, a list of available heterogeneous robot equipment into a set of available heterogeneous robot equipment; and
performing, by the hardware scheduler, a plurality of simulations to iteratively select each heterogeneous robot equipment from the set of available heterogeneous robot equipment and assign respective ones of the multi-robot tasks to each of the selected heterogeneous robot equipment using a Q-network, wherein each simulated assignment comprises:
receiving, by the hardware scheduler, temporal constraints comprising: deadlines and wait constraints of each of the multi-robot tasks and an objection for a minimizing of an overall process duration of the multi-robot tasks;
receiving, by the hardware scheduler, spatial constraints comprising: a first spatial constraint requiring that a location can only be occupied by one of the heterogeneous robot equipment at a time and a second spatial constraint requiring a minimum safety distance allowed between each of the plurality of heterogeneous robot equipment while executing the multi-robot tasks;
inputting, by the hardware scheduler, into a simple temporal network (STN)-based model, a plurality of nodes corresponding to the multi-robot tasks, each represented by a start time node and a finish time node;
reducing, by the hardware scheduler, complexity of the simple temporal network (STN)-based model by removing all finish time nodes of the multi-robot tasks, except for a time point when all the multi-robot tasks will be completed while preserving all the temporal constraints or providing a reduced STN model of the STN;
building, by the hardware scheduler, a heterogeneous graph g from states in the STN-based model that convolutionality encodes into the heterogeneous graph g: the temporal constraints, the spatial constraints, and at least one other constraint associated with the available heterogeneous robot equipment, locations of the available heterogeneous robot equipment, specific locations of the multi-robot tasks, and shared tools employed by the available heterogeneous robot equipment;
computing, by the hardware scheduler using heterogeneous graph attention layers of a graph attention network, input features for the plurality of nodes in the heterogeneous graph g, wherein the input features comprises: a minimum of an expected time to complete an unscheduled task of a plurality of unscheduled tasks, a maximum of the expected time to complete the unscheduled task, a mean of the expected time to complete the unscheduled task, and a standard deviation of the expected time to complete the unscheduled task;
merging, by the hardware scheduler, the inputted features as a multi-head output for each multi-head layer in the heterogeneous graph attention layers of the graph attention network using at least a concatenation and an averaging;
learning, by the hardware scheduler, a greedy policy for sequential decision making by constructing a schedule as a Markov decision process (MDP) using a tuple that includes at least: a first tuple comprising the states at each decision-step that includes the temporal constraints represented by the STN, the locations of the available heterogeneous robot equipment, and all, previously constructed, partial schedules of the available heterogeneous robot equipment, a second tuple comprising actions corresponding to appending the unscheduled task at end of a partial schedule of a selected one of the available heterogeneous robot equipment, a third tuple comprising transitions, corresponding to adding edges associated with each of the actions into the STN and updating the partial schedule of the selected one of the available heterogeneous robot equipment, a fourth tuple comprising a reward of a state-action pair defined as a change in an objective value after taking one of the actions while minimizing the overall process duration and a fifth tuple comprising a discount factor;
employing, by the hardware scheduler, imitation learning to train the Q-network, by scaling up, the heterogeneous graph g from small-scale solutions employed by an expert, to large scale problems solved by grounding, below a total value of the reward, Q-values of alternative actions not selected by the expert while ensuring that a gradient is trained only on alternative, unselected actions with a maximum Q-value, and ensuring that the gradient is propagated through all the alternative unselected actions that have a subset of the Q-values higher than a difference between the total value of the reward and an empirically selected offset constant;
employing, by the hardware scheduler, the graph attention network, the input features, the transitions, and a result of the imitation learning to generate a greedy schedule; and
iteratively selecting, by the hardware scheduler, based on the greedy policy, the greedy schedule having assignments of the multi-robot tasks to each of the selected heterogeneous robot equipment, the greedy schedule in accordance with a policy from the group consisting of: a first policy associated with first availability of a first one of the available heterogeneous robot equipment, a second policy associated with a minimum average time on the plurality of unscheduled tasks, a third policy associated with a minimum time on any one of the plurality of unscheduled task, a fourth policy associated with a minimum average time on all of the multi-robot tasks.
|