Journey

A visual record of the project's progress so far — from the first cardboard chassis to multi-step navigation driven by natural language. Each milestone is a checkpoint where something meaningful clicked into place.

Hardware

Cardboard prototype

The first wheeled chassis — built out of cardboard around an early sensor suite. Validates the platform mechanically before committing to a permanent frame.

Early cardboard chassis on the move.

Perception

Digital twin simulation

A continuously updated 3D representation of the robot's pose and immediate environment, reconstructed from its depth camera and IMU in simulation. This is the foundation the rest of the stack reasons on top of.

3D reconstruction of the robot's environment.

Reasoning

Twin + vision-language reasoning

Layering a vision-language model on top of the digital twin so the robot can describe what it sees and reason about its surroundings in natural language.

VLM captioning the live twin view.

Hardware

Chassis redesign & rebuild

Replacing the cardboard prototype with a more user-friendly, approachable chassis. Applies the chat pipeline from the cardboard version.

The redesigned robot in conversation.

Dashboard

Live twin & navigation overlays

Live twin reflects the navigation state in real time. Applies the navigation stack from the digital twin simulation.

Twin updating live as the robot moves.

Navigation state streamed alongside the twin.

Navigation

Navigation: failure

The early end-to-end navigation runs: from chatting to navigation. Mostly useful for surfacing what was still broken. Wheels got stuck, camera connection unstable, slow model response times, etc.

An early navigation attempt that did not quite work.

Navigation

Navigation: point-to-point

A clean point-to-point navigation indoors based on natural-language instruction. The robot reaches a goal pose using its own map and avoids obstacles along the way.

Driving to a goal pose, end to end.

Navigation

Navigation: multi-step tasks

Multi-step navigation tasks driven from natural-language instructions. The reasoning layer decomposes the request into navigation goals and dispatches them to the planner as a subagent.

Multi-step navigation from a single instruction.