May 2026
I'm building four projects right now. They look independent but they're not. They're different pieces of the same question: what does it take to make AI agents actually work in practice?
Not in a research paper. Not in a demo. In real systems where things break, data leaks, and users talk instead of type.
Hive is the foundation. A local-first agent OS where you define agents in YAML, pick their LLM backend, and let them run persistently. No Docker, no cloud. The question it answers: how do you orchestrate multiple agents on a single machine without everything being a cloud dependency?
Unplug is the defense layer. A 3-stage pipeline (regex, ML classifier, LLM judge) that catches prompt injection, data leakage, and jailbreaks before they reach your agent. The question it answers: if agents act autonomously, how do you make sure they don't get tricked into doing something they shouldn't?
Mutter is the voice interface. A macOS menu bar app where you press a hotkey, talk, and it routes your speech to the right action (task, note, or query). Everything runs locally via MLX Whisper and LM Studio. The question it answers: what's the minimum friction way to interact with an AI system?
NexNet is the understanding layer. A neural network framework built from scratch in NumPy, including BERT and GPT. This one's less about production use and more about knowing what's actually happening inside the models I build on top of.
Here's the thing: every agent framework I've tried treats these as separate problems. You pick an orchestration layer, bolt on some safety checks, add a UI, and pray the integration doesn't break.
I'm approaching it differently. Each project teaches me one part of the stack deeply enough to build it from scratch:
Together, they cover the full stack of making agents work: the runtime (Hive), the safety (Unplug), the interface (Mutter), and the understanding of what's under the hood (NexNet).
There's a deliberate choice here. I could use LangChain for orchestration, buy a security API, use the OpenAI SDK for voice, and call it a day. Ship faster, less code to maintain.
But then I'd be a consumer of these systems, not a builder. And the whole point of this exercise is to understand every layer well enough that when something breaks (and it always breaks), I know where to look.
GSoC at Jenkins taught me this. When I built the diagnostic system there, the multi-backend LLM adapter wasn't just "support multiple providers." It was understanding how each provider handles streaming differently, how token counting varies, how fallback logic needs to account for rate limits vs actual errors. You only learn that by building it.
Same philosophy here. Each project is a forcing function that makes me understand one part of the agent stack at a level I can't get from tutorials.
The immediate roadmap:
The long-term bet: if you understand orchestration, security, interfaces, and model internals, you can build agent systems that actually work in production. Not demos. Not proofs of concept. Real systems.
Everything I'm building is open source. If any of this sounds useful, the repos are linked from the homepage.