#webdev#javascript#typescript#productivity

I Built a Beautiful Text-to-Speech Reader in One Night

How I turned the web's best articles into an audio experience — with AI voices, word highlighting, and a listening history — in a single overnight build.

Jacobo·March 31, 2026

I have a problem with reading. Not the ability — the time. Every day I bookmark 10 articles and read maybe one. The rest sit in a tab graveyard, judging me.

I kept thinking: I have a 40-minute commute. I walk my dog every morning. I cook dinner every evening. That's nearly two hours a day when my eyes are busy but my ears aren't. Why can't I just listen to my articles?

There are tools for this. Pocket has TTS. So does Readwise. But they're clunky, the voices are robotic, and the experience feels like a voice memo from 2012. I wanted something that felt premium — like Audible, but for the open web.

So I built TTS Reader Pro in one night.

The Core Insight

The Web Speech API has been in browsers since 2012. It's free, it's native, and nobody uses it because the default voices are terrible. But the interface to call it is actually pretty clean:

const utterance = new SpeechSynthesisUtterance(text);
utterance.voice = speechSynthesis.getVoices().find(v => v.name === selectedVoice);
utterance.rate = playbackRate;
utterance.onboundary = (e) => highlightWord(e.charIndex);
speechSynthesis.speak(utterance);

That onboundary event is the magic. It fires for every word. That's how you build the scrolling word-highlight experience — the thing that makes it feel like you're reading along even when you're not looking.

The Technical Decisions

Stack: Next.js 15 + Tailwind v4 + shadcn/ui + TypeScript. Same stack I use for everything now. Zero decision fatigue.

Voices: I curated 8 voices from what the browser provides — renaming them with personality names (Aria, Marcus, Zoe, Leo) so the UX doesn't feel like a settings menu from the 90s.

Word highlighting: The tricky part. Web Speech API gives you character offsets, not word indices. I wrote a small utility that maps charIndex to an array of word spans, then CSS transitions handle the visual glow. Surprisingly smooth.

Reading history: Stored in localStorage with charIndex progress. So when you close the app and come back, you get "Resume from where you left off" — one of those small UX moments that makes a product feel like it cares.

URL input: I didn't implement actual web scraping (that's a CORS nightmare for a free MVP). Instead, you paste text directly. The field is huge, the placeholder is inviting, and there's a "Try a Demo Article" button to see it working immediately. Good enough for v1.

What Surprised Me

Two things:

First, how much the voice choice matters. I spent 20 minutes just cycling through voices while reading the same paragraph. Some make you want to fall asleep. Others make the article feel urgent. Voice is to audio what font is to text — a massive but invisible UX decision.

Second, the focus mode. I added a full-screen reader view almost as an afterthought, where the current sentence is big and centered and the rest dims out. Five minutes into testing it, I realized I actually wanted to use this app. That's when you know you've built something real.

What I'd Do Next

- Real URL scraping via a server-side route using Playwright or Mercury Parser

- Voice uploads — bring your own ElevenLabs API key for ultra-realistic voices

- Sleep timer — stop after 30 minutes for podcast-style listening

- Export to MP3 — download the audio file to your phone

- Browser extension — one-click "read this page" from any tab

The core is there. The feel is right. Now it just needs more surface area.

Try It

Live app: https://tts-reader-pro.limed.tech

Paste an article. Pick a voice. Hit play. Your commute just got better.