<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>fosstog - local-ai</title>
    <subtitle>Free &amp; Open Source Photography</subtitle>
    <link rel="self" type="application/atom+xml" href="https://fosstog.com/tags/local-ai/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://fosstog.com"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-05-27T00:00:00+00:00</updated>
    <id>https://fosstog.com/tags/local-ai/atom.xml</id>
    <entry xml:lang="en">
        <title>Down the Agentic AI Rabbit Hole</title>
        <published>2026-05-27T00:00:00+00:00</published>
        <updated>2026-05-27T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              ganthore
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://fosstog.com/blog/agentic-ai-rabbit-hole/"/>
        <id>https://fosstog.com/blog/agentic-ai-rabbit-hole/</id>
        
        <content type="html" xml:base="https://fosstog.com/blog/agentic-ai-rabbit-hole/">&lt;p&gt;&lt;strong&gt;TL;DR&lt;&#x2F;strong&gt;: I’ve been building out a full agentic AI stack — local GPU compute through &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;lemonade-server.ai&#x2F;&quot;&gt;Lemonade&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;rocm.docs.amd.com&#x2F;&quot;&gt;ROCm&lt;&#x2F;a&gt;, heavily customized &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;opencode.ai&#x2F;&quot;&gt;OpenCode&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;claude.ai&#x2F;code&quot;&gt;Claude Code&lt;&#x2F;a&gt; setups, and a web-style agent called &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;GaiaNet-AI&#x2F;gaianet-node&quot;&gt;Gaia&lt;&#x2F;a&gt; running on local hardware. It’s excessive. I love it.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;At some point I crossed a line from “casually interested in AI tooling” to “I have an Ansible playbook managing a local model farm and I’m weirdly proud of it.” Hard to say exactly when that happened. Doesn’t matter. Here’s what I’ve been building.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;local-compute-lemonade-rocm&quot;&gt;Local Compute: Lemonade + ROCm&lt;&#x2F;h2&gt;
&lt;p&gt;The foundation is &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;lemonade-server.ai&#x2F;&quot;&gt;Lemonade&lt;&#x2F;a&gt;, an AMD project that serves LLMs locally with full GPU acceleration through &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;rocm.docs.amd.com&#x2F;&quot;&gt;ROCm&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.vulkan.org&#x2F;&quot;&gt;Vulkan&lt;&#x2F;a&gt;. It runs an OpenAI-compatible API locally, which means anything that talks to OpenAI can be pointed at your own hardware instead. Under the hood it’s &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ggml-org&#x2F;llama.cpp&quot;&gt;llama.cpp&lt;&#x2F;a&gt; doing the actual inference work.&lt;&#x2F;p&gt;
&lt;p&gt;The hardware doing the heavy lifting is a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.amd.com&#x2F;en&#x2F;products&#x2F;processors&#x2F;desktops&#x2F;ryzen&#x2F;7000-series&#x2F;amd-ryzen-9-7950x3d.html&quot;&gt;Ryzen 9 7950X3D&lt;&#x2F;a&gt; paired with a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.amd.com&#x2F;en&#x2F;products&#x2F;graphics&#x2F;desktops&#x2F;radeon&#x2F;9000-series&#x2F;amd-radeon-rx-9070-xt.html&quot;&gt;Radeon RX 9070 XT&lt;&#x2F;a&gt;. The 7950X3D handles CPU inference and anything that doesn’t need the GPU; the 9070 XT takes everything ROCm can throw at it. One wrinkle: the board also has an integrated GPU, and ROCm’s idea of a good time is apparently enumerating both and then crashing. A one-line config isolating the dGPU fixed it, but it’s the kind of thing you only find out about after you’ve already stared at a crash log for longer than you’d like to admit.&lt;&#x2F;p&gt;
&lt;p&gt;Getting ROCm working on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archlinux.org&#x2F;&quot;&gt;Arch Linux&lt;&#x2F;a&gt; is the kind of experience that makes you briefly consider a career in farming, but once it clicked I had a local inference server actually using my GPU instead of farming the work out to a data center somewhere in Virginia.&lt;&#x2F;p&gt;
&lt;p&gt;The current roster of models I’m actually using:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;Qwen&#x2F;Qwen3.5-9B&quot;&gt;Qwen3.5-9B&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; — the main workhorse, solid general-purpose performance at a size that doesn’t require a prayer before each inference&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;Qwen&#x2F;Qwen3.5-4B&quot;&gt;Qwen3.5-4B&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; (via vLLM) — when you want Qwen but faster and cheaper&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ai.google.dev&#x2F;gemma&quot;&gt;Gemma 4 4B&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; — Google’s efficient 4B, punches above its weight for general tasks&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;HuggingFaceTB&#x2F;SmolLM3-3B&quot;&gt;SmolLM3-3B&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; — HuggingFace’s 3B model, absurdly capable for its size&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;Tongyi-MAI&#x2F;Z-Image&quot;&gt;Z-Image-Turbo&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; — Alibaba TongYi’s image generation model, currently sitting at #1 open-source on the text-to-image leaderboard; yes I’m generating images locally too&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The whole thing is managed as a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.podman.io&#x2F;en&#x2F;latest&#x2F;markdown&#x2F;podman-systemd.unit.5.html&quot;&gt;Podman quadlet&lt;&#x2F;a&gt; and deployed via Ansible, because consistency matters and I’ve already written about my feelings on running containers the right way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;opencode&quot;&gt;OpenCode&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;opencode.ai&#x2F;&quot;&gt;OpenCode&lt;&#x2F;a&gt; is my primary coding agent when I’m not using Claude Code, and I’ve been building it out hard — all pointing at Lemonade as the backend. The default model is &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;Qwen&#x2F;Qwen3.5-9B&quot;&gt;Qwen3.5-9B&lt;&#x2F;a&gt; with &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;huggingface.co&#x2F;HuggingFaceTB&#x2F;SmolLM3-3B&quot;&gt;SmolLM3-3B&lt;&#x2F;a&gt; handling the quick low-stakes tasks where pulling up a 9B model is overkill.&lt;&#x2F;p&gt;
&lt;p&gt;The LSP setup covers Bash, Lua, Kotlin, Rust, Terraform, TypeScript, and YAML — basically everything I touch regularly. The MCP list is where it gets interesting: &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;containers&#x2F;podman-mcp-server&quot;&gt;Podman&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;context7.com&#x2F;&quot;&gt;Context7&lt;&#x2F;a&gt; for live documentation, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bitbonsai&#x2F;mcpvault&quot;&gt;MCPVault&lt;&#x2F;a&gt; for my Obsidian vault, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;modelcontextprotocol&#x2F;servers&#x2F;tree&#x2F;main&#x2F;src&#x2F;sequentialthinking&quot;&gt;Sequential Thinking&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;playwright-mcp&quot;&gt;Playwright&lt;&#x2F;a&gt; for browser automation, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;comfyanonymous&#x2F;ComfyUI&quot;&gt;ComfyUI&lt;&#x2F;a&gt; for image generation, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ghidra-sre.org&#x2F;&quot;&gt;Ghidra&lt;&#x2F;a&gt; for reverse engineering, and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.aseprite.org&#x2F;&quot;&gt;Aseprite&lt;&#x2F;a&gt; for pixel art. Yes, really. It’s a coding agent that can also help me draw sprites and disassemble binaries, which is either the most useful thing I’ve ever set up or a sign that I need a hobby. I have several, so we’ll call it useful.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bitbonsai&#x2F;mcpvault&quot;&gt;MCPVault&lt;&#x2F;a&gt; connection deserves a separate mention. I use &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;obsidian.md&#x2F;&quot;&gt;Obsidian&lt;&#x2F;a&gt; as my primary knowledge base, and wiring it into the MCP ecosystem means agents can read and write to my vault directly — skills, notes, references, project context. The result is a shared long-term memory layer that both OpenCode and Claude Code can draw from. Agents forget things between sessions by default; this is how you fix that.&lt;&#x2F;p&gt;
&lt;p&gt;No API costs, no rate limits, nothing leaving the machine. That’s the pitch and it holds up. The catch is that local models aren’t Claude — they’re genuinely good in a way that would’ve been hard to believe two years ago, but they have a ceiling and I’m not going to pretend otherwise.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;gaia&quot;&gt;Gaia&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;amd&#x2F;gaia&quot;&gt;Gaia&lt;&#x2F;a&gt; is AMD’s local AI agent platform — essentially a web-style interface for running agents on your own hardware. I packaged it for Arch (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;aur.archlinux.org&#x2F;packages&#x2F;gaia-amd&quot;&gt;gaia-amd on AUR&lt;&#x2F;a&gt;) and pointed it at Lemonade. It’s less CLI, more browsable — a different way to interact with the local model stack. Still figuring out where it slots in, but it’s interesting enough to keep around.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;claude-code&quot;&gt;Claude Code&lt;&#x2F;h2&gt;
&lt;p&gt;Claude Code is the other half of the daily setup, and I’ve been pulling on every thread I can find:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Capsule Kit&lt;&#x2F;strong&gt; — a SQLite-backed context memory system that hooks into Claude Code’s lifecycle to capture session state, file operations, and sub-agent work automatically. The goal is continuity across sessions without having to re-explain yourself every time.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Skills system&lt;&#x2F;strong&gt; — a custom skill framework for common workflows. Complex tasks, debugging, multi-agent coordination, code review — instead of typing the same setup instructions repeatedly, a skill handles it. This post came from a skill, actually.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Crew mode&lt;&#x2F;strong&gt; — parallel multi-agent work across git worktrees. Spin up a team of specialized agents working independent branches simultaneously, then merge. Still experimental but already useful for tasks with clearly separable workstreams.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;agents.md&#x2F;&quot;&gt;AGENTS.md&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; — I spent time thinking about how to make agent instructions portable across tools rather than maintaining separate configs for every agent. AGENTS.md is an open standard (Linux Foundation, 23+ compatible agents) that solves exactly this: one file in the project root, every agent reads it. OpenCode included. This site has one now.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;Local AI is actually useful now — not “useful if you squint” but useful for real work. The models are good, the tooling has matured, and the ROCm situation on AMD hardware is dramatically better than it was a year ago.&lt;&#x2F;p&gt;
&lt;p&gt;For anything genuinely hard, Claude is still doing the heavy lifting. The local models earn their keep on the stuff where keeping things on-prem matters more than raw capability — background jobs, quick questions, anything you’d rather not pipe through someone else’s API.&lt;&#x2F;p&gt;
&lt;p&gt;That gap is closing though. Slowly, but it’s moving.&lt;&#x2F;p&gt;
&lt;p&gt;I’ve been in tech since I was five years old. I’ve watched enough “this changes everything” moments to know which ones to ignore. This isn’t one of those. Getting your hands dirty with this stuff now — actually running it, not just reading about it — is starting to feel less optional than I’d like it to. I’ll leave it at that.&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
