ChorusReader
Pre-launchEvery character gets a voice
Takes your books and gives every character a distinct voice. 30 built-in voices, 12 emotion categories, real-time synthesis on iPhone. No per-character cloud fees, no data leakage. Powered by ChorusTTS — a custom accelerated pipeline built on Chatterbox weights.
Audiobook narration is a solved problem if you're fine with one voice reading everything. But characters have voices. Dialogue has emotion. ChorusReader assigns distinct voices to characters and modulates emotion per scene — happy, sad, angry, contemplative. All running on your iPhone, no cloud API calls, no per-character fees.
- 30 built-in voices with distinct timbres
- 12 emotion categories for expressive narration
- ePub and PDF support with chapter detection
- Real-time synthesis — 20% faster than realtime on iPhone
- Powered by ChorusTTS — a custom accelerated pipeline built on Chatterbox weights (thanks Resemble AI)
Multi-voice narration — each character gets a distinct voice and emotion.
ChorusTTS is a custom Swift pipeline built on Chatterbox weights by Resemble AI. Three stages turn text into speech — each optimized for Apple Silicon and quantized to fit on a phone.
┌────────────────────────────────────────────────┐
│ Book Analysis │
│ ┌──────────┐ ┌───────────┐ ┌────────────┐ │
│ │ Quote │ │ Character │ │ Emotion │ │
│ │Detection │─▶│Attribution│─▶│Enrichment │ │
│ │(heuristic│ │(LLM + turn│ │(per-segment│ │
│ │ + regex) │ │ tracking) │ │ profiling) │ │
│ └──────────┘ └───────────┘ └─────┬──────┘ │
└─────────────────────────────────────┼──────────┘
│
30 voice presets ──────────┤
or cloned voice │
(a few seconds ▼
of reference ┌─────────────────┐
audio) │ T3 Transformer │
│ text → speech │
│ tokens │
└────────┬────────┘
│
▼
┌─────────────────┐
│ S3Gen (flow │
│ matching) │
│ tokens → mel │
│ spectrogram │
└────────┬────────┘
│
▼
┌─────────────────┐
│ HiFT Vocoder │
│ mel → waveform │
│ 24kHz audio │
└────────┬────────┘
│
┌─────────────▼──────────────┐
│ Streaming Playback │
│ (plays as chunks arrive, │
│ checkpoint resume) │
└────────────────────────────┘
iOS: 4-bit quantized, compiled to CoreML on first launch
macOS: 8-bit quantized, native MLX execution End-to-end pipeline from book analysis to streaming playback. Thanks to Resemble AI for open-sourcing the Chatterbox weights.
- Hybrid book analysis — regex-based quote detection feeds into on-device LLM (Foundation Models) for speaker attribution and emotion tagging
- Each voice is a set of speaker embeddings, not a separate model — switching voices is instant, costs no additional memory
- 12 emotion profiles map to concrete TTS parameters (temperature, guidance weight, pacing) — the same text sounds different when a character whispers vs. shouts
- Streaming synthesis with checkpoint resume — if the app is interrupted mid-generation, it picks up exactly where it left off
- Voice cloning from a few seconds of reference audio via an LSTM encoder — bring your own narrator