The long-term planning system is responsible for giving tasks to the short-term module and for communicating with humans and other robots. It takes spatial context from the database, the vision tower's output and the short-term large language model's output as input, if needed, queries the database and retrieval model for additional info and then gives the command to the short-term module.
Database
This is responsible for storing contextual and spatial information about all relatively important tasks, objects and people. If social memory is turned on, it is capable of remembering people's names and preferences. For each particular task is memorized the terms for performing the task correctly, when the task should be done and what are the commander's preferences if there should be variations during the task execution.
Short-Term Action Module
This is responsible for controlling the low-level actions. It is trained specifically with over 1,000,000 trajectories to adapt to various environments, commands and situations.
3D Mapping
The 3D Mapper continuously scans the environment and compiles the surroundings to a voxel grid, where each voxel is part of a classified object. The 3D Mapper can take the depth map from only one depth camera as an input or use the symbiosis of depth cameras and LiDARs.