Down the Agentic AI Rabbit Hole

2026-05-27T00:00:00+00:00

TL;DR: I’ve been building out a full agentic AI stack — local GPU compute through Lemonade</a> and ROCm</a>, heavily customized OpenCode</a> and Claude Code</a> setups, and a web-style agent called Gaia</a> running on local hardware. It’s excessive. I love it.

At some point I crossed a line from “casually interested in AI tooling” to “I have an Ansible playbook managing a local model farm and I’m weirdly proud of it.” Hard to say exactly when that happened. Doesn’t matter. Here’s what I’ve been building.
Local Compute: Lemonade + ROCm</h2>
The foundation is Lemonade</a>, an AMD project that serves LLMs locally with full GPU acceleration through ROCm</a> and Vulkan</a>. It runs an OpenAI-compatible API locally, which means anything that talks to OpenAI can be pointed at your own hardware instead. Under the hood it’s llama.cpp</a> doing the actual inference work.
The hardware doing the heavy lifting is a Ryzen 9 7950X3D</a> paired with a Radeon RX 9070 XT</a>. The 7950X3D handles CPU inference and anything that doesn’t need the GPU; the 9070 XT takes everything ROCm can throw at it. One wrinkle: the board also has an integrated GPU, and ROCm’s idea of a good time is apparently enumerating both and then crashing. A one-line config isolating the dGPU fixed it, but it’s the kind of thing you only find out about after you’ve already stared at a crash log for longer than you’d like to admit.
Getting ROCm working on Arch Linux</a> is the kind of experience that makes you briefly consider a career in farming, but once it clicked I had a local inference server actually using my GPU instead of farming the work out to a data center somewhere in Virginia.
The current roster of models I’m actually using:

Qwen3.5-9B</a> — the main workhorse, solid general-purpose performance at a size that doesn’t require a prayer before each inference</li>
Qwen3.5-4B</a> (via vLLM) — when you want Qwen but faster and cheaper</li>
Gemma 4 4B</a> — Google’s efficient 4B, punches above its weight for general tasks</li>
SmolLM3-3B</a> — HuggingFace’s 3B model, absurdly capable for its size</li>
Z-Image-Turbo</a> — Alibaba TongYi’s image generation model, currently sitting at #1 open-source on the text-to-image leaderboard; yes I’m generating images locally too</li> </ul>
The whole thing is managed as a Podman quadlet</a> and deployed via Ansible, because consistency matters and I’ve already written about my feelings on running containers the right way.
OpenCode</h2>
OpenCode</a> is my primary coding agent when I’m not using Claude Code, and I’ve been building it out hard — all pointing at Lemonade as the backend. The default model is Qwen3.5-9B</a> with SmolLM3-3B</a> handling the quick low-stakes tasks where pulling up a 9B model is overkill.
The LSP setup covers Bash, Lua, Kotlin, Rust, Terraform, TypeScript, and YAML — basically everything I touch regularly. The MCP list is where it gets interesting: Podman</a>, Context7</a> for live documentation, MCPVault</a> for my Obsidian vault, Sequential Thinking</a>, Playwright</a> for browser automation, ComfyUI</a> for image generation, Ghidra</a> for reverse engineering, and Aseprite</a> for pixel art. Yes, really. It’s a coding agent that can also help me draw sprites and disassemble binaries, which is either the most useful thing I’ve ever set up or a sign that I need a hobby. I have several, so we’ll call it useful.
The MCPVault</a> connection deserves a separate mention. I use Obsidian</a> as my primary knowledge base, and wiring it into the MCP ecosystem means agents can read and write to my vault directly — skills, notes, references, project context. The result is a shared long-term memory layer that both OpenCode and Claude Code can draw from. Agents forget things between sessions by default; this is how you fix that.
No API costs, no rate limits, nothing leaving the machine. That’s the pitch and it holds up. The catch is that local models aren’t Claude — they’re genuinely good in a way that would’ve been hard to believe two years ago, but they have a ceiling and I’m not going to pretend otherwise.
Gaia</h2>
Gaia</a> is AMD’s local AI agent platform — essentially a web-style interface for running agents on your own hardware. I packaged it for Arch (gaia-amd on AUR</a>) and pointed it at Lemonade. It’s less CLI, more browsable — a different way to interact with the local model stack. Still figuring out where it slots in, but it’s interesting enough to keep around.
Claude Code</h2>
Claude Code is the other half of the daily setup, and I’ve been pulling on every thread I can find:
Capsule Kit — a SQLite-backed context memory system that hooks into Claude Code’s lifecycle to capture session state, file operations, and sub-agent work automatically. The goal is continuity across sessions without having to re-explain yourself every time.
Skills system — a custom skill framework for common workflows. Complex tasks, debugging, multi-agent coordination, code review — instead of typing the same setup instructions repeatedly, a skill handles it. This post came from a skill, actually.
Crew mode — parallel multi-agent work across git worktrees. Spin up a team of specialized agents working independent branches simultaneously, then merge. Still experimental but already useful for tasks with clearly separable workstreams.
AGENTS.md</a> — I spent time thinking about how to make agent instructions portable across tools rather than maintaining separate configs for every agent. AGENTS.md is an open standard (Linux Foundation, 23+ compatible agents) that solves exactly this: one file in the project root, every agent reads it. OpenCode included. This site has one now.

Local AI is actually useful now — not “useful if you squint” but useful for real work. The models are good, the tooling has matured, and the ROCm situation on AMD hardware is dramatically better than it was a year ago.
For anything genuinely hard, Claude is still doing the heavy lifting. The local models earn their keep on the stuff where keeping things on-prem matters more than raw capability — background jobs, quick questions, anything you’d rather not pipe through someone else’s API.
That gap is closing though. Slowly, but it’s moving.
I’ve been in tech since I was five years old. I’ve watched enough “this changes everything” moments to know which ones to ignore. This isn’t one of those. Getting your hands dirty with this stuff now — actually running it, not just reading about it — is starting to feel less optional than I’d like it to. I’ll leave it at that.

fosstog - local-ai

Down the Agentic AI Rabbit Hole