AI Agent
TARS
A locally-hosted AI agent that remembers things, runs on my own hardware, and listens to voice notes over Telegram.
I got tired of assistants that reset every time I opened a chat. No memory from last week, no context on active projects, no continuity. So I built one that keeps context. TARS runs on my Linux machine, takes voice notes through Telegram, transcribes locally with Whisper, handles the request, and writes useful details into an Obsidian vault that carries across sessions. It feels less like a chatbot and more like a second brain I can talk to.
The memory architecture.
Most agent setups I tried either ignored memory or jammed too much into a system prompt. Neither held up in practice. Without persistence, the agent forgets everything. With prompt stuffing, it burns context on old details before the conversation even starts.
My setup uses an Obsidian vault at ~/brain/with a clear directory structure. During a conversation, Haiku writes relevant details into today's daily note. At 11pm, a Sonnet cron job reads through the day's log, extracts anything worth keeping, and writes or updates files in memory/ with proper wikilinks.
~/brain/
daily/ session logs, appended during conversations
memory/
people/ notes on individuals
projects/ per-project context
knowledge/ reference material
inbox/ staging for unprocessed notesThis two-layer setup keeps live costs low and long-term memory useful. Haiku handles live interactions cheaply. Sonnet runs once at night and does the heavier consolidation work, deciding what should be kept and where it belongs. Daily notes stay as raw logs, and memory files stay organized and linked.
The ~/brain/ vault structure showing daily notes and consolidated memory files
Debugging the voice transcription pipeline during early setup, errors included
It started with n8n.
My first instinct was to build the agent workflow in n8n. I already had it running from earlier automation work, and I wanted predictable branching and deterministic steps. I wired nodes for Whisper transcription, model routing, vault writes, and responses. The structured parts worked.
The weak spot was conversation state. n8n is built for discrete workflows that start and finish. Keeping context alive across back-and-forth dialogue meant pushing the tool past what it is designed for. Session boundaries kept forcing fragile state patches.
Then I found OpenClaw
That is when I moved to OpenClaw, a self-hosted framework built for persistent agents. It handles personality config, memory access, and conversation continuity directly. The workarounds I needed in n8n were not necessary there.
Starting with n8n still helped because it made the requirements obvious. Pipeline automation and stateful conversation are different problems, so I separated them. OpenClaw runs the agent and memory layer. n8n handles structured background jobs such as nightly vault consolidation. Each tool does what it is good at.
The stack.
The hardware runs everything locally: Ryzen 9 9900x, RTX 5060 Ti (16GB VRAM), 32GB DDR5, PopOS Linux. Local inference runs through Ollama. Claude Haiku handles most conversations via the Anthropic API. Sonnet handles heavier reasoning and the nightly cron job that consolidates the day's notes into the vault's permanent memory layer.
TARS responding to a voice message via Telegram
Token and rate-limit optimization.
After the core loop was stable, I treated token usage and API limits as system constraints instead of afterthoughts. Voice transcripts, retrieval results, and long-running context can inflate prompt size fast, so I added guardrails that keep each request inside a defined budget before it reaches a model.
This reduced token burn significantly and made the system much less sensitive to API rate-limit spikes, while keeping response quality consistent for daily use.
What I'd do differently.
Define memory before writing code
I went through a few vault structure iterations before landing on the current one, slower than they needed to be because I didn't have a clear model of what I was building toward. The two-layer daily/permanent split seems obvious in hindsight. I'd start there.
Start from requirements, not available tooling
I moved to n8n partly because I already had it running. That is not a great architecture decision by itself, and it cost me a few weeks. Pick tools that fit the problem, not tools that happen to be available.
Running in daily use. The core loop is stable: voice in, transcription, reasoning, vault write, response out. I am still iterating on memory architecture to separate what is genuinely useful from what turns into noise.
