Skip to main content

Tool-calling layer

Neural Cube uses a standardized tool-calling protocol so the language model can call tools in a structured way. This is what lets the model do more than just chat, turning a single natural-language instruction into a sequence of concrete actions like searching the web, asking a visual question, taking a photo, or driving to a goal.

Tool-calling architecture

The current set of tools

Store memory

The store-memory tool gives the model a persistent place to write down durable facts so it can build up long-term information about its environment, the user, and the people and places it encounters, instead of starting from scratch each session. Similar memories are clustered and consolidated over time so memory stays compact. This tool is only used for durable information and not momentary states or small talk, which is stored contextually and compacted when necessary.

The web-search tool lets the model look up current information, recent news, or facts it doesn't already know, such as weather, prices, schedules, or anything time-sensitive. If the model already knows the answer, it's instructed to respond directly instead of searching.

Visual question answering

The visual question-answering tool lets the model ask a specific question about what the robot's camera is currently looking at (for example, "what color is the basket?"). This is what closes the loop when a task requires visual information that wasn't captured in the continuous scene summary, or when something in the environment needs a closer look.

Take photo

The take-photo tool saves a photo of what the camera currently sees to disk. The model is instructed to only call it when the user explicitly asks to take, save, or capture a photo. Saved photos show up in the dashboard's Gallery panel so the user can browse them later.

The navigation tool is how the robot moves. All physical movement goes through it. It accepts natural-language instructions, either goal-shaped ("go to the kitchen", "head to the front door", "back up a metre") or open-ended ("roam and explore", "look around"), and "stop" to cancel any in-flight navigation. The user can still speak to steer or stop the robot mid-navigation.

Dance

The dance tool makes the robot do a short, fun, rhythmic dance. It drives the motors directly rather than going through the navigation system. I added it mostly for fun.