Building a Local Voice AI on Raspberry Pi 5: What Actually Works in 2026
I built a voice assistant on a Raspberry Pi 5 that runs entirely offline. No cloud, no subscriptions, no data leaving the device. Here's what worked, what didn't, and whether it's worth your time.
Building a Local Voice AI on Raspberry Pi 5: What Actually Works in 2026
I wanted a voice assistant that didn't phone home. No API calls. No subscriptions. Nothing leaving my network.
So I built one on a Raspberry Pi 5.
Here's what I learned -- including the parts that don't show up in the tutorials.
Why Bother Going Local?
The obvious reason is privacy. But there's a less-discussed one: reliability. Cloud voice assistants go down. They get deprecated. Pricing changes. And when you're building a custom interface for a client, you want something that works in five years without a vendor making that decision for you.
For this build, the goal was simple: wake word detection, speech-to-text, LLM reasoning, text-to-speech. All on-device. Zero network dependency.
The Hardware Stack
Raspberry Pi 5 (8GB RAM) -- the 8GB model is not optional. You need it. The 4GB variant runs out of headroom fast once the LLM is loaded.
USB microphone -- I used a cheap omnidirectional mic. Quality matters less than you'd think at this stage; the STT model handles noise better than expected.
3.5mm speaker -- the Pi's onboard audio is fine for testing. For production, a small USB audio DAC gives cleaner output.
Optional: Raspberry Pi AI HAT+ 2 -- Hailo's accelerator (released January 2026) adds 40 TOPS of inference capability. It helps with vision workloads but makes less difference for text-only voice pipelines. Skip it unless you're running a camera alongside.
Where is your AI budget leaking?
Free snapshot. No credentials. Results in minutes.
The Software Stack
This is where most tutorials diverge from reality. Here's what actually worked:
Wake word: OpenWakeWord (github.com/dscripka/openWakeWord). Runs on CPU, low latency, customizable. I trained a custom trigger word in about 20 minutes using their web tool.
Speech-to-text: Whisper Tiny or Small via faster-whisper. Tiny processes in 2-3 seconds on Pi 5. Small is more accurate. For normal speech in a quiet room, Tiny is good enough.
LLM: Phi-3 Mini (3.8B params, Q4 quantized) via Ollama. This is the sweet spot for the Pi 5. Larger models are too slow. Smaller ones lose coherence. Phi-3 Mini at Q4 gives about 3-4 tokens per second.
Text-to-speech: Piper TTS. Fast, local, surprisingly natural. The en_US-lessac-medium voice is my default. Full sentence generation takes under a second.
What the Full Pipeline Looks Like
Microphone -> Wake Word Detection -> Record Utterance -> faster-whisper -> LLM -> Piper TTS -> Speaker
Total round-trip: 15-25 seconds on Pi 5 without acceleration. That's slow. But for home automation triggers, reminders, or local data queries, it's workable.
The key trick: stream the TTS output while the LLM is still generating. Don't wait for the full response. Start speaking the first sentence as soon as it's complete. This drops perceived latency to 8-12 seconds, which is much more tolerable.
What I'd Do Differently
Don't use a Pi 5 for latency-sensitive applications. If you need sub-5-second responses, you need a GPU. An RTX 3070 running locally is 10x faster than a Pi 5. The Pi is the right tool for always-on, low-power, embedded use cases.
Plan your context window carefully. The Pi 5's memory constraints mean small models with limited context. Keep system prompts short. Don't expect it to maintain a long conversation history without compression.
Lower the temperature. At defaults, small models are verbose and sometimes incoherent. Drop it to 0.3-0.5 for voice use cases. You want predictable, concise output.
Where This Actually Makes Sense
For a general-purpose voice assistant competing with Alexa in responsiveness -- no, the latency gap is too wide right now.
But for narrow, specific purposes? It works well.
A voice interface for a local medical records system. A floor manager assistant at a factory without reliable internet. A privacy-first home hub that runs on $75 hardware with zero ongoing subscription cost.
That's the opportunity. Not a replacement for cloud assistants. A replacement for the class of problem where cloud assistants aren't an option.
Building something similar? Need a custom voice AI interface for a compliance-restricted environment or offline use case? I build these async -- no meetings, flat-rate pricing: https://bmdpat.com/start
Ready to automate?
I build AI agents and automated workflows. Async delivery. No meetings. Flat rate.
Start a ProjectGet new posts delivered to your inbox
No spam. Unsubscribe anytime.
More from the blog
Serving a Live LLM From My Home Office: What Local Inference in Production Actually Looks Like
I run Llama 3.1 8B on an RTX 5070 Ti from my home office. Here's the actual setup, when it makes sense for a real business, and when it doesn't.
A 9-Person Startup Replaced Its Dev Team With AI Agents. Here's the Part That Actually Matters.
JustPaid ran 7 AI agents 24/7 using OpenClaw and Claude Code and shipped 10 features in a month. Their first bill was $4,000 a week. Here is what that number tells you.
What Claude Code's Source Code Reveals About Building Production AI Agents
Anthropic accidentally leaked Claude Code's source. I read through it. Here are 6 architecture patterns that are changing how I build agents for clients.