Short pieces, written while shipping. Latency budgets, voice agent architecture, RAG patterns that survived contact with real users, and the engineering moves behind what makes an AI product feel good versus broken. No thought leadership. No predictions. Only what I learned this week.
Latest
Voice agents stopped being a research problem. They're a latency budget now.
The components matured. Streaming STT under 300ms. TTS first audio under 100ms. LLM inference under 300ms. The work moved from "can we" to "how do we hit 700ms end to end, every call, in production." Here is how I think about it while building a phone agent for L1 support.
Two more pieces in flight: one on RAG retrieval evals that don't lie, one on the cost curve of multi agent orchestration. They drop when they drop. Best way to get notified is to follow me on X or just check back.
Bring the work in house.
If the kind of engineering thinking in these pieces is what your AI product is missing, the booking link below is the way to start. 30 minutes, no pitch.