There is a new navigation system in the works that leverages VSLAM for localization and mapping, Nvblox for 3D scene reconstruction and cost mapping, Nav2 for path planning and obstacle avoidance, and an LLM-based decision-making system for high-level task reasoning and goal setting.
The previous system simply outputted motor commands based on LLM analysis of what the user wants and object data collected from the camera (no map building or path planning).
This works for simple scenarios where the environment is well-known and static. For scenarios that involve moving obstacles such as people or pets and tasks that require exploration or longer-term navigation, this system would fail since the LLM only sees snapshots of the world before each output command with no global map view.
The new system will use a cost mapper (Nvblox) and navigation planner (Nav2) to enable real-time path planning and dynamic obstacle avoidance using high-frequency localization and mapping data (VSLAM). It will be controlled by an LLM-based decision-making system that can reason about high-level tasks and environmental cues to decide where to go next.
Currently, I am developing a system that allows the robot to decide for itself where to go based on a given task and environmental cues. This works by sending task information and semantic descriptions of the environment to the LLM and converting LLM outputs into actionable commands for Nav2.
NOTE: The LLM cannot accurately determine facing rotation, so I am omitting the need for the model to output a target rotation (yaw) for now and just focusing on target position (x, y). The target rotation is calculated as the look at rotation from the robot’s current position to its target position.

A description of what the robot is supposed to do, e.g. "Find an [object] and go to it."
[val00, val01, ..., val0n;
val10, val11, ..., val1n;
..., ..., ..., ...;
valm0, valm1, ..., valmn]
{"object1": "(x, y)", "object2": "(x, y)", ...}
Navigation towards (x, y) failed / canceled / succeeded.
{
"REASONING": "The task is to ... I should ...",
"COORDINATES": [x, y],
"TASK STATUS": "IN PROGRESS"
}
One challenge with using LLMs for navigation is that they may not always have enough information to make informed decisions. For example, if the robot is tasked with finding a yellow basket but there is only information about a basket in the environment, the LLM needs to be able to ask clarifying questions about the color in order to proceed effectively.
At a basic level, the LLM is currently able to perform the following steps to gather information:
Last few lines of logged output demonstrating this as the robot approaches a yellow basket:
