Raspberry PiEdge AIVoice AssistantLocal LLMPrivacyConsumer Hardware2026

Building a Local Voice AI on Raspberry Pi 5: What Actually Works in 2026

I built a voice assistant on a Raspberry Pi 5 that runs entirely offline. No cloud, no subscriptions, no data leaving the device. Here's what worked, what didn't, and whether it's worth your time.

Patrick Hughes
6 min read
Share: LinkedIn Twitter

Building a Local Voice AI on Raspberry Pi 5: What Actually Works in 2026

I wanted a voice assistant that didn't phone home. No API calls. No subscriptions. Nothing leaving my network.

So I built one on a Raspberry Pi 5.

Here's what I learned -- including the parts that don't show up in the tutorials.

Why Bother Going Local?

The obvious reason is privacy. But there's a less-discussed one: reliability. Cloud voice assistants go down. They get deprecated. Pricing changes. And when you're building a custom interface for a client, you want something that works in five years without a vendor making that decision for you.

For this build, the goal was simple: wake word detection, speech-to-text, LLM reasoning, text-to-speech. All on-device. Zero network dependency.

The Hardware Stack

Raspberry Pi 5 (8GB RAM) -- the 8GB model is not optional. You need it. The 4GB variant runs out of headroom fast once the LLM is loaded.

USB microphone -- I used a cheap omnidirectional mic. Quality matters less than you'd think at this stage; the STT model handles noise better than expected.

3.5mm speaker -- the Pi's onboard audio is fine for testing. For production, a small USB audio DAC gives cleaner output.

Optional: Raspberry Pi AI HAT+ 2 -- Hailo's accelerator (released January 2026) adds 40 TOPS of inference capability. It helps with vision workloads but makes less difference for text-only voice pipelines. Skip it unless you're running a camera alongside.

Where is your AI budget leaking?

Free snapshot. No credentials. Results in minutes.

Run Free Snapshot

The Software Stack

This is where most tutorials diverge from reality. Here's what actually worked:

Wake word: OpenWakeWord (github.com/dscripka/openWakeWord). Runs on CPU, low latency, customizable. I trained a custom trigger word in about 20 minutes using their web tool.

Speech-to-text: Whisper Tiny or Small via faster-whisper. Tiny processes in 2-3 seconds on Pi 5. Small is more accurate. For normal speech in a quiet room, Tiny is good enough.

LLM: Phi-3 Mini (3.8B params, Q4 quantized) via Ollama. This is the sweet spot for the Pi 5. Larger models are too slow. Smaller ones lose coherence. Phi-3 Mini at Q4 gives about 3-4 tokens per second.

Text-to-speech: Piper TTS. Fast, local, surprisingly natural. The en_US-lessac-medium voice is my default. Full sentence generation takes under a second.

What the Full Pipeline Looks Like

Microphone -> Wake Word Detection -> Record Utterance -> faster-whisper -> LLM -> Piper TTS -> Speaker

Total round-trip: 15-25 seconds on Pi 5 without acceleration. That's slow. But for home automation triggers, reminders, or local data queries, it's workable.

The key trick: stream the TTS output while the LLM is still generating. Don't wait for the full response. Start speaking the first sentence as soon as it's complete. This drops perceived latency to 8-12 seconds, which is much more tolerable.

What I'd Do Differently

Don't use a Pi 5 for latency-sensitive applications. If you need sub-5-second responses, you need a GPU. An RTX 3070 running locally is 10x faster than a Pi 5. The Pi is the right tool for always-on, low-power, embedded use cases.

Plan your context window carefully. The Pi 5's memory constraints mean small models with limited context. Keep system prompts short. Don't expect it to maintain a long conversation history without compression.

Lower the temperature. At defaults, small models are verbose and sometimes incoherent. Drop it to 0.3-0.5 for voice use cases. You want predictable, concise output.

Where This Actually Makes Sense

For a general-purpose voice assistant competing with Alexa in responsiveness -- no, the latency gap is too wide right now.

But for narrow, specific purposes? It works well.

A voice interface for a local medical records system. A floor manager assistant at a factory without reliable internet. A privacy-first home hub that runs on $75 hardware with zero ongoing subscription cost.

That's the opportunity. Not a replacement for cloud assistants. A replacement for the class of problem where cloud assistants aren't an option.


Building something similar? Need a custom voice AI interface for a compliance-restricted environment or offline use case? I build these async -- no meetings, flat-rate pricing: https://bmdpat.com/start

Ready to automate?

I build AI agents and automated workflows. Async delivery. No meetings. Flat rate.

Start a Project

Get new posts delivered to your inbox

No spam. Unsubscribe anytime.

More from the blog