Skip to content

๐Ÿ”Š Make Your AI Talk Back!

Your AI can chat, create images, understand audio, and analyze files. Now letโ€™s give it a voice! ๐ŸŽค

Imagine users asking โ€œWhatโ€™s the weather like?โ€ and your AI speaking back in a warm, friendly voice instead of just showing text. Or reading long articles aloud while they work on other things!

What weโ€™re building: Your AI will be able to speak any text in 6 different voice personalities - from professional business tones to energetic marketing voices. Itโ€™s like having a team of voice actors inside your app!


Current state: Your AI shows brilliant text responses Target state: Users can hear your AI speak with natural voices!

Before (Silent AI):

User: "Explain quantum physics"
AI: [Shows long text explanation]
User: [Has to read everything] ๐Ÿ˜ด

After (Speaking AI):

User: "Explain quantum physics"
AI: [Shows text AND speaks it] ๐Ÿ”Š
User: [Can listen while doing other things] ๐ŸŽง

The magic: Your AI becomes accessible, engaging, and multitask-friendly!

Real-world impact:

  • ๐Ÿ“ฑ Accessibility heroes - Visually impaired users can fully enjoy your app
  • ๐Ÿƒโ€โ™€๏ธ Multitasking magic - Users can listen while exercising, driving, or working
  • ๐Ÿง  Learning boost - Audio learners absorb information better when they hear it
  • ๐Ÿ“š Instant podcasts - Turn any article into audio content on demand
  • ๐ŸŽฏ Better engagement - Voice keeps users active instead of passive readers

Without voice AI:

โŒ Hire expensive voice actors
โŒ Use robotic computer voices
โŒ Miss 15% of users who prefer audio
โŒ Limited to text-only experiences

With voice AI:

โœ… Professional voices in seconds
โœ… Natural, engaging speech
โœ… Serve all learning styles
โœ… Complete multimedia experience

OpenAI gives you a complete voice acting team! Each one has a distinct personality:

๐ŸŽ™๏ธ Alloy - The Professional

Perfect for: Business presentations, formal content
Sounds like: Your trusted corporate spokesperson
User feels: Confident and professional

๐ŸŒŠ Echo - The Calm Companion

Perfect for: Meditation apps, soothing content
Sounds like: Your gentle yoga instructor
User feels: Relaxed and peaceful

๐Ÿ“š Fable - The Master Storyteller

Perfect for: Creative content, engaging stories
Sounds like: Your favorite audiobook narrator
User feels: Captivated and entertained

๐ŸŽฏ Onyx - The Authority

Perfect for: News, important announcements
Sounds like: Your trusted news anchor
User feels: Informed and confident

โ˜€๏ธ Nova - The Friendly Helper

Perfect for: Tutorials, customer support
Sounds like: Your helpful best friend
User feels: Welcome and supported

โœจ Shimmer - The Energy Booster

Perfect for: Marketing, motivational content
Sounds like: Your enthusiastic coach
User feels: Excited and motivated

Pro tip: Weโ€™ll build a voice selector so users can choose their favorite!


Good news: Weโ€™re using the exact same patterns you already know!

What you already have:

// Your familiar Response API pattern
const response = await client.responses.create({
model: "gpt-4o",
input: [systemPrompt, userMessage]
});

What weโ€™re adding:

// New voice synthesis (same style!)
const speech = await client.audio.speech.create({
model: "tts-1",
voice: "alloy",
input: textToSpeak
});

Perfect! Same patterns, just different endpoints.

Simple concept: Text goes in โ†’ Beautiful voice comes out!

// What we need to track:
const voiceState = {
textInput: "Hello, I'm your AI assistant!", // What to say
selectedVoice: "nova", // Who says it
audioSettings: { // How to say it
speed: 1.0, // Normal speed
quality: "hd", // High definition
format: "mp3" // Audio format
},
generatedAudio: "audio-file-url", // Result!
}

Voice options:

  • ๐Ÿƒโ€โ™‚๏ธ TTS-1 - Fast generation (great for testing)
  • ๐Ÿ’Ž TTS-1-HD - Premium quality (perfect for production)
  • โšก Speed control - From 0.25x (slow) to 4x (fast)
  • ๐ŸŽต Formats - MP3, Opus, AAC, FLAC

Add this to your existing server - same patterns you know and love:

import fs from 'fs';
import path from 'path';
// ๐Ÿ”Š VOICE PROFILES: Available AI voices with personalities
const VOICE_PROFILES = {
alloy: {
name: "Alloy",
description: "Professional and versatile",
bestFor: "Business content, presentations"
},
echo: {
name: "Echo",
description: "Calm and soothing",
bestFor: "Meditation, relaxation content"
},
fable: {
name: "Fable",
description: "Expressive storyteller",
bestFor: "Stories, creative content"
},
onyx: {
name: "Onyx",
description: "Deep and authoritative",
bestFor: "News, formal announcements"
},
nova: {
name: "Nova",
description: "Warm and friendly",
bestFor: "Customer service, tutorials"
},
shimmer: {
name: "Shimmer",
description: "Bright and energetic",
bestFor: "Marketing, upbeat content"
}
};
// ๐Ÿ”ง HELPER FUNCTIONS: Audio processing utilities
const saveAudioToTemp = async (audioBuffer, format = 'mp3') => {
const tempDir = path.join(process.cwd(), "temp");
// Create temp directory if it doesn't exist
if (!fs.existsSync(tempDir)) {
fs.mkdirSync(tempDir, { recursive: true });
}
// Create unique filename
const filename = `tts-${Date.now()}.${format}`;
const filepath = path.join(tempDir, filename);
// Write audio file
fs.writeFileSync(filepath, audioBuffer);
// Auto-cleanup after 1 hour
setTimeout(() => {
try {
if (fs.existsSync(filepath)) {
fs.unlinkSync(filepath);
console.log(`๐Ÿงน Cleaned up: ${filename}`);
}
} catch (error) {
console.error("Error cleaning up audio file:", error);
}
}, 3600000); // 1 hour
return { filepath, filename };
};
// ๐Ÿ”Š AI Text-to-Speech endpoint - add this to your existing server
app.post("/api/tts/generate", async (req, res) => {
try {
// ๐Ÿ›ก๏ธ VALIDATION: Check required inputs
const {
text,
voice = "alloy",
model = "tts-1",
speed = 1.0,
format = "mp3"
} = req.body;
if (!text || text.trim() === "") {
return res.status(400).json({
error: "Text is required",
success: false
});
}
if (text.length > 4096) {
return res.status(400).json({
error: "Text too long. Maximum 4096 characters allowed.",
current_length: text.length,
success: false
});
}
console.log(`๐Ÿ”Š Generating speech: ${text.substring(0, 50)}... (${voice})`);
// ๐ŸŽ™๏ธ AI SPEECH GENERATION: Convert text to speech
const response = await openai.audio.speech.create({
model: model, // tts-1 (fast) or tts-1-hd (high quality)
voice: voice, // AI voice personality
input: text.trim(), // Text to convert
response_format: format, // Audio format (mp3, opus, aac, flac)
speed: Math.max(0.25, Math.min(4.0, speed)) // Speaking speed (0.25x to 4x)
});
// ๐Ÿ’พ AUDIO PROCESSING: Save audio file
const audioBuffer = Buffer.from(await response.arrayBuffer());
const { filepath, filename } = await saveAudioToTemp(audioBuffer, format);
// ๐Ÿ“ค SUCCESS RESPONSE: Send audio info and download link
res.json({
success: true,
audio: {
filename: filename,
format: format,
size: audioBuffer.length,
duration_estimate: Math.ceil(text.length / 14), // ~14 characters per second
download_url: `/api/tts/download/${filename}`
},
generation: {
voice: voice,
voice_info: VOICE_PROFILES[voice],
model: model,
speed: speed,
text_length: text.length
},
timestamp: new Date().toISOString()
});
} catch (error) {
// ๐Ÿšจ ERROR HANDLING: Handle TTS failures
console.error("Text-to-speech error:", error);
res.status(500).json({
error: "Failed to generate speech",
details: error.message,
success: false
});
}
});
// ๐Ÿ“ฅ Audio Download endpoint - serve generated audio files
app.get("/api/tts/download/:filename", (req, res) => {
try {
const { filename } = req.params;
const filepath = path.join(process.cwd(), "temp", filename);
// Security check - ensure filename is safe
if (!filename.match(/^tts-\d+\.(mp3|opus|aac|flac)$/)) {
return res.status(400).json({ error: "Invalid filename" });
}
// Check if file exists
if (!fs.existsSync(filepath)) {
return res.status(404).json({ error: "Audio file not found or expired" });
}
// Serve audio file
const extension = path.extname(filename).substring(1);
res.setHeader('Content-Type', `audio/${extension}`);
res.setHeader('Content-Disposition', `attachment; filename="${filename}"`);
const audioBuffer = fs.readFileSync(filepath);
res.send(audioBuffer);
} catch (error) {
console.error("Audio download error:", error);
res.status(500).json({
error: "Failed to download audio",
message: error.message
});
}
});
// ๐ŸŽ™๏ธ Voice Information endpoint - get available voices
app.get("/api/tts/voices", (req, res) => {
res.json({
success: true,
voices: VOICE_PROFILES,
models: [
{
id: "tts-1",
name: "TTS-1",
description: "Fast, cost-effective synthesis",
quality: "standard"
},
{
id: "tts-1-hd",
name: "TTS-1 HD",
description: "High-definition audio quality",
quality: "premium"
}
],
formats: ["mp3", "opus", "aac", "flac"],
speed_range: { min: 0.25, max: 4.0, default: 1.0 },
text_limit: 4096
});
});

What this does (step by step):

  1. โœ… Validates text - Makes sure we have something to say
  2. ๐ŸŽญ Picks voice - Selects the right AI personality
  3. ๐ŸŽ™๏ธ Generates speech - OpenAI creates beautiful audio
  4. ๐Ÿ’พ Saves file - Stores audio temporarily for download
  5. ๐Ÿ“ค Returns results - Sends back audio URL and metadata
  6. ๐Ÿงน Cleans up - Removes old files automatically

Same reliable patterns as your chat and image features!

Add this middleware to handle text-to-speech specific errors:

// ๐Ÿšจ TTS ERROR HANDLING: Handle text-to-speech errors
app.use((error, req, res, next) => {
if (error.message && error.message.includes('Invalid voice')) {
return res.status(400).json({
error: "Invalid voice selected. Please choose from: alloy, echo, fable, onyx, nova, shimmer",
success: false
});
}
if (error.message && error.message.includes('text too long')) {
return res.status(400).json({
error: "Text exceeds maximum length of 4096 characters",
success: false
});
}
next(error);
});

Your backend now supports:

  • Text chat (existing functionality)
  • Streaming chat (existing functionality)
  • Image generation (existing functionality)
  • Audio transcription (existing functionality)
  • File analysis (existing functionality)
  • Text-to-speech (new functionality)
---
## ๐Ÿ”ง Step 3: Building the React Text-to-Speech Component
Now let's create a React component for text-to-speech using the same patterns from your existing components.
### **Step 3A: Creating the Text-to-Speech Component**
Create a new file `src/TextToSpeech.jsx`:
```jsx
import { useState, useRef, useEffect } from "react";
import { Volume2, Play, Pause, Download, Settings } from "lucide-react";
function TextToSpeech() {
// ๐Ÿง  STATE: Text-to-speech data management
const [text, setText] = useState(""); // Text to convert
const [selectedVoice, setSelectedVoice] = useState("alloy"); // AI voice selection
const [audioSettings, setAudioSettings] = useState({ // TTS settings
model: "tts-1",
speed: 1.0,
format: "mp3"
});
const [isGenerating, setIsGenerating] = useState(false); // Processing status
const [generatedAudio, setGeneratedAudio] = useState([]); // Generated audio list
const [currentlyPlaying, setCurrentlyPlaying] = useState(null); // Audio playback state
const [voices, setVoices] = useState({}); // Available voices
const [error, setError] = useState(null); // Error messages
const audioRef = useRef(null);
// Load available voices on component mount
useEffect(() => {
fetchVoices();
}, []);
const fetchVoices = async () => {
try {
const response = await fetch("http://localhost:8000/api/tts/voices");
const data = await response.json();
if (data.success) {
setVoices(data.voices);
}
} catch (error) {
console.error('Failed to fetch voices:', error);
}
};
// ๐Ÿ”ง FUNCTIONS: Text-to-speech logic engine
// Main speech generation function
const generateSpeech = async () => {
// ๐Ÿ›ก๏ธ GUARDS: Prevent invalid generation
if (!text.trim() || isGenerating) return;
// ๐Ÿ”„ SETUP: Prepare for generation
setIsGenerating(true);
setError(null);
try {
// ๐Ÿ“ค API CALL: Send to your backend
const response = await fetch("http://localhost:8000/api/tts/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: text.trim(),
voice: selectedVoice,
...audioSettings
})
});
const data = await response.json();
if (!response.ok) {
throw new Error(data.error || 'Failed to generate speech');
}
// โœ… SUCCESS: Store generated audio
const newAudio = {
id: Date.now(),
text: text.trim(),
voice: selectedVoice,
settings: audioSettings,
audio: data.audio,
generation: data.generation,
timestamp: new Date().toISOString()
};
setGeneratedAudio(prev => [newAudio, ...prev]);
setText(""); // Clear input after successful generation
} catch (error) {
// ๐Ÿšจ ERROR HANDLING: Show user-friendly message
console.error('Speech generation failed:', error);
setError(error.message || 'Something went wrong while generating speech');
} finally {
// ๐Ÿงน CLEANUP: Reset generation state
setIsGenerating(false);
}
};
// Audio playback function
const playAudio = async (audioItem) => {
try {
if (currentlyPlaying?.id === audioItem.id) {
// Pause current audio
if (audioRef.current) {
audioRef.current.pause();
setCurrentlyPlaying(null);
}
return;
}
// Stop any currently playing audio
if (audioRef.current) {
audioRef.current.pause();
}
// Create new audio element
const audio = new Audio(`http://localhost:8000${audioItem.audio.download_url}`);
audioRef.current = audio;
audio.onloadstart = () => setCurrentlyPlaying({ ...audioItem, status: 'loading' });
audio.oncanplay = () => setCurrentlyPlaying({ ...audioItem, status: 'ready' });
audio.onplay = () => setCurrentlyPlaying({ ...audioItem, status: 'playing' });
audio.onpause = () => setCurrentlyPlaying({ ...audioItem, status: 'paused' });
audio.onended = () => setCurrentlyPlaying(null);
audio.onerror = () => {
setCurrentlyPlaying(null);
setError('Failed to play audio');
};
await audio.play();
} catch (error) {
console.error('Audio playback error:', error);
setCurrentlyPlaying(null);
setError('Failed to play audio');
}
};
// Download audio function
const downloadAudio = (audioItem) => {
try {
const link = document.createElement('a');
link.href = `http://localhost:8000${audioItem.audio.download_url}`;
link.download = `speech-${audioItem.id}.${audioItem.audio.format}`;
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
} catch (error) {
console.error('Download error:', error);
setError('Failed to download audio');
}
};
// Sample texts for quick testing
const sampleTexts = [
"Welcome to our application! I'm excited to help you with AI-powered text-to-speech.",
"Once upon a time, in the world of artificial intelligence, voices came alive with just a few lines of code.",
"This is a test of the emergency broadcast system. This is only a test.",
"Take a deep breath and relax as you listen to this calming AI-generated voice.",
"Breaking news: AI technology continues to amaze us with natural-sounding speech synthesis."
];
// Utility functions
const formatFileSize = (bytes) => {
if (bytes === 0) return '0 Bytes';
const k = 1024;
const sizes = ['Bytes', 'KB', 'MB'];
const i = Math.floor(Math.log(bytes) / Math.log(k));
return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
};
const formatDuration = (seconds) => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${secs.toString().padStart(2, '0')}`;
};
// ๐ŸŽจ UI: Interface components
return (
<div className="min-h-screen bg-gradient-to-br from-orange-50 to-red-50 flex items-center justify-center p-4">
<div className="bg-white rounded-2xl shadow-2xl w-full max-w-4xl flex flex-col overflow-hidden">
{/* Header */}
<div className="bg-gradient-to-r from-orange-600 to-red-600 text-white p-6">
<div className="flex items-center space-x-3">
<div className="w-10 h-10 bg-white bg-opacity-20 rounded-full flex items-center justify-center">
<Volume2 className="w-5 h-5" />
</div>
<div>
<h1 className="text-xl font-bold">๐Ÿ”Š AI Text-to-Speech</h1>
<p className="text-orange-100 text-sm">Convert any text to natural speech!</p>
</div>
</div>
</div>
{/* Voice Settings Section */}
<div className="p-6 border-b border-gray-200">
<h3 className="font-semibold text-gray-900 mb-4 flex items-center">
<Settings className="w-5 h-5 mr-2 text-orange-600" />
Voice Settings
</h3>
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
{/* Voice Selection */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">Voice</label>
<select
value={selectedVoice}
onChange={(e) => setSelectedVoice(e.target.value)}
disabled={isGenerating}
className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-orange-500 disabled:bg-gray-100"
>
{Object.entries(voices).map(([key, voice]) => (
<option key={key} value={key}>
{voice.name} - {voice.description}
</option>
))}
</select>
</div>
{/* Model Selection */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">Quality</label>
<select
value={audioSettings.model}
onChange={(e) => setAudioSettings(prev => ({ ...prev, model: e.target.value }))}
disabled={isGenerating}
className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-orange-500 disabled:bg-gray-100"
>
<option value="tts-1">Standard (Fast)</option>
<option value="tts-1-hd">HD (High Quality)</option>
</select>
</div>
{/* Speed Control */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">
Speed ({audioSettings.speed}x)
</label>
<input
type="range"
min="0.25"
max="4"
step="0.05"
value={audioSettings.speed}
onChange={(e) => setAudioSettings(prev => ({ ...prev, speed: parseFloat(e.target.value) }))}
disabled={isGenerating}
className="w-full h-2 bg-gray-200 rounded-lg appearance-none cursor-pointer disabled:cursor-not-allowed"
/>
</div>
{/* Format Selection */}
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">Format</label>
<select
value={audioSettings.format}
onChange={(e) => setAudioSettings(prev => ({ ...prev, format: e.target.value }))}
disabled={isGenerating}
className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-orange-500 disabled:bg-gray-100"
>
<option value="mp3">MP3</option>
<option value="opus">Opus</option>
<option value="aac">AAC</option>
<option value="flac">FLAC</option>
</select>
</div>
</div>
</div>
{/* Text Input Section */}
<div className="p-6 border-b border-gray-200">
<div className="mb-4">
<div className="flex justify-between items-center mb-2">
<label className="block text-sm font-medium text-gray-700">Text to Convert</label>
<span className="text-sm text-gray-500">{text.length}/4096 characters</span>
</div>
<textarea
value={text}
onChange={(e) => setText(e.target.value)}
placeholder="Enter the text you want to convert to speech..."
className="w-full px-4 py-3 border border-gray-300 rounded-xl focus:outline-none focus:ring-2 focus:ring-orange-500 focus:border-transparent transition-all duration-200 resize-none"
rows={4}
maxLength={4096}
disabled={isGenerating}
/>
</div>
{/* Sample Texts */}
<div className="mb-4">
<p className="text-sm text-gray-600 mb-2">Quick samples:</p>
<div className="flex flex-wrap gap-2">
{sampleTexts.map((sample, index) => (
<button
key={index}
onClick={() => setText(sample)}
disabled={isGenerating}
className="px-3 py-1 text-sm bg-gray-100 hover:bg-orange-100 text-gray-700 hover:text-orange-700 rounded-full transition-colors duration-200 disabled:opacity-50 disabled:cursor-not-allowed"
>
{sample.substring(0, 30)}...
</button>
))}
</div>
</div>
{/* Generate Button */}
<div className="flex justify-center">
<button
onClick={generateSpeech}
disabled={isGenerating || !text.trim()}
className="px-8 py-3 bg-gradient-to-r from-orange-600 to-red-600 hover:from-orange-700 hover:to-red-700 disabled:from-gray-300 disabled:to-gray-300 text-white rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg disabled:shadow-none"
>
{isGenerating ? (
<>
<div className="w-4 h-4 border-2 border-white border-t-transparent rounded-full animate-spin"></div>
<span>Generating...</span>
</>
) : (
<>
<Volume2 className="w-4 h-4" />
<span>Generate Speech</span>
</>
)}
</button>
</div>
</div>
{/* Results Section */}
<div className="flex-1 p-6">
{/* Error Display */}
{error && (
<div className="bg-red-50 border border-red-200 rounded-lg p-4 mb-4">
<p className="text-red-700">
<strong>Error:</strong> {error}
</p>
</div>
)}
{/* Generated Audio List */}
{generatedAudio.length === 0 ? (
<div className="text-center py-12">
<div className="w-16 h-16 bg-orange-100 rounded-2xl flex items-center justify-center mx-auto mb-4">
<Volume2 className="w-8 h-8 text-orange-600" />
</div>
<h3 className="text-lg font-semibold text-gray-700 mb-2">
No Audio Generated Yet
</h3>
<p className="text-gray-600 max-w-md mx-auto">
Enter some text above and click "Generate Speech" to create your first AI voice.
</p>
</div>
) : (
<div className="space-y-4">
<h4 className="font-semibold text-gray-900 mb-4">
Generated Audio ({generatedAudio.length})
</h4>
{generatedAudio.map((audioItem) => (
<div key={audioItem.id} className="bg-gray-50 rounded-lg p-4 border border-gray-200">
<div className="flex items-start justify-between mb-3">
<div className="flex-1">
<div className="flex items-center space-x-2 mb-2">
<div className="p-1 bg-orange-100 rounded">
<Volume2 className="w-4 h-4 text-orange-600" />
</div>
<span className="font-medium text-gray-900 text-sm">
{voices[audioItem.voice]?.name || audioItem.voice}
</span>
<span className="text-xs text-gray-500">
{new Date(audioItem.timestamp).toLocaleTimeString()}
</span>
</div>
<p className="text-sm text-gray-700 mb-2 line-clamp-2">
{audioItem.text}
</p>
<div className="flex flex-wrap gap-1 text-xs">
<span className="px-2 py-1 bg-orange-100 text-orange-800 rounded-full">
{audioItem.settings.model}
</span>
<span className="px-2 py-1 bg-blue-100 text-blue-800 rounded-full">
{audioItem.settings.speed}x speed
</span>
<span className="px-2 py-1 bg-green-100 text-green-800 rounded-full">
{formatFileSize(audioItem.audio.size)}
</span>
<span className="px-2 py-1 bg-gray-100 text-gray-800 rounded-full">
~{formatDuration(audioItem.audio.duration_estimate)}
</span>
</div>
</div>
<div className="flex items-center space-x-2">
<button
onClick={() => playAudio(audioItem)}
className="p-2 bg-orange-500 hover:bg-orange-600 text-white rounded-lg transition-colors duration-200"
title={currentlyPlaying?.id === audioItem.id ? "Pause" : "Play"}
>
{currentlyPlaying?.id === audioItem.id && currentlyPlaying?.status === 'playing' ? (
<Pause className="w-4 h-4" />
) : (
<Play className="w-4 h-4" />
)}
</button>
<button
onClick={() => downloadAudio(audioItem)}
className="p-2 bg-green-500 hover:bg-green-600 text-white rounded-lg transition-colors duration-200"
title="Download audio"
>
<Download className="w-4 h-4" />
</button>
</div>
</div>
</div>
))}
</div>
)}
</div>
</div>
</div>
);
}
export default TextToSpeech;

Update your src/App.jsx to include the new text-to-speech component:

import { useState } from "react";
import StreamingChat from "./StreamingChat";
import ImageGenerator from "./ImageGenerator";
import AudioTranscription from "./AudioTranscription";
import FileAnalysis from "./FileAnalysis";
import TextToSpeech from "./TextToSpeech";
import { MessageSquare, Image, Mic, Folder, Volume2 } from "lucide-react";
function App() {
// ๐Ÿง  STATE: Navigation management
const [currentView, setCurrentView] = useState("chat"); // 'chat', 'images', 'audio', 'files', or 'speech'
// ๐ŸŽจ UI: Main app with navigation
return (
<div className="min-h-screen bg-gray-100">
{/* Navigation Header */}
<nav className="bg-white shadow-sm border-b border-gray-200">
<div className="max-w-6xl mx-auto px-4">
<div className="flex items-center justify-between h-16">
{/* Logo */}
<div className="flex items-center space-x-3">
<div className="w-8 h-8 bg-gradient-to-r from-blue-500 to-purple-600 rounded-lg flex items-center justify-center">
<span className="text-white font-bold text-sm">AI</span>
</div>
<h1 className="text-xl font-bold text-gray-900">OpenAI Mastery</h1>
</div>
{/* Navigation Buttons */}
<div className="flex space-x-2">
<button
onClick={() => setCurrentView("chat")}
className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${
currentView === "chat"
? "bg-blue-100 text-blue-700 shadow-sm"
: "text-gray-600 hover:text-gray-900 hover:bg-gray-100"
}`}
>
<MessageSquare className="w-4 h-4" />
<span>Chat</span>
</button>
<button
onClick={() => setCurrentView("images")}
className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${
currentView === "images"
? "bg-purple-100 text-purple-700 shadow-sm"
: "text-gray-600 hover:text-gray-900 hover:bg-gray-100"
}`}
>
<Image className="w-4 h-4" />
<span>Images</span>
</button>
<button
onClick={() => setCurrentView("audio")}
className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${
currentView === "audio"
? "bg-blue-100 text-blue-700 shadow-sm"
: "text-gray-600 hover:text-gray-900 hover:bg-gray-100"
}`}
>
<Mic className="w-4 h-4" />
<span>Audio</span>
</button>
<button
onClick={() => setCurrentView("files")}
className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${
currentView === "files"
? "bg-green-100 text-green-700 shadow-sm"
: "text-gray-600 hover:text-gray-900 hover:bg-gray-100"
}`}
>
<Folder className="w-4 h-4" />
<span>Files</span>
</button>
<button
onClick={() => setCurrentView("speech")}
className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${
currentView === "speech"
? "bg-orange-100 text-orange-700 shadow-sm"
: "text-gray-600 hover:text-gray-900 hover:bg-gray-100"
}`}
>
<Volume2 className="w-4 h-4" />
<span>Speech</span>
</button>
</div>
</div>
</div>
</nav>
{/* Main Content */}
<main className="h-[calc(100vh-4rem)]">
{currentView === "chat" && <StreamingChat />}
{currentView === "images" && <ImageGenerator />}
{currentView === "audio" && <AudioTranscription />}
{currentView === "files" && <FileAnalysis />}
{currentView === "speech" && <TextToSpeech />}
</main>
</div>
);
}
export default App;

Letโ€™s test your text-to-speech feature step by step to make sure everything works correctly.

First, verify your backend route works by testing it directly:

Test with a simple text:

Terminal window
curl -X POST http://localhost:8000/api/tts/generate \
-H "Content-Type: application/json" \
-d '{"text": "Hello, this is a test of AI voice synthesis.", "voice": "alloy", "model": "tts-1"}'

Expected response:

{
"success": true,
"audio": {
"filename": "tts-1234567890.mp3",
"format": "mp3",
"size": 15420,
"duration_estimate": 3,
"download_url": "/api/tts/download/tts-1234567890.mp3"
},
"generation": {
"voice": "alloy",
"voice_info": {
"name": "Alloy",
"description": "Professional and versatile"
},
"model": "tts-1",
"speed": 1.0,
"text_length": 44
}
}

Start both servers:

Backend (in your backend folder):

Terminal window
npm run dev

Frontend (in your frontend folder):

Terminal window
npm run dev

Test the complete flow:

  1. Navigate to Speech โ†’ Click the โ€œSpeechโ€ tab in navigation
  2. Select voice settings โ†’ Choose voice, quality, speed, and format
  3. Enter text โ†’ Type or select a sample text
  4. Generate speech โ†’ Click โ€œGenerate Speechโ€ and see loading state
  5. Listen to audio โ†’ Click play button to hear the generated voice
  6. Download audio โ†’ Test downloading the speech file
  7. Try different voices โ†’ Test all six AI voices with the same text

Test all six voices with the same text to hear their personalities:

๐ŸŽ™๏ธ Alloy: Professional and neutral
๐ŸŒŠ Echo: Calm and soothing
๐Ÿ“š Fable: Expressive storyteller
๐ŸŽฏ Onyx: Deep and authoritative
โ˜€๏ธ Nova: Warm and friendly
โœจ Shimmer: Bright and energetic

Expected behavior:

  • Each voice has distinct personality and tone
  • Audio quality is clear and natural
  • Playback controls work smoothly
  • Download generates proper audio files

Congratulations! Youโ€™ve completed your comprehensive OpenAI mastery application with text-to-speech:

  • โœ… Extended your backend with voice synthesis and audio file management
  • โœ… Added React speech component following the same patterns as your other features
  • โœ… Implemented six AI voices with distinct personalities and use cases
  • โœ… Created flexible audio settings for quality, speed, and format control
  • โœ… Added playback functionality with play/pause controls
  • โœ… Maintained consistent design with your existing application

Your complete application now has:

  • Text chat with streaming responses
  • Image generation with DALL-E 3 and GPT-Image-1
  • Audio transcription with Whisper voice recognition
  • File analysis with intelligent document processing
  • Text-to-speech with six AI voice personalities
  • Unified navigation between all features
  • Professional UI with consistent TailwindCSS styling

๐ŸŽ‰ Youโ€™ve built a complete OpenAI mastery application! Your users can now chat with AI, generate images, transcribe audio, analyze files, and hear AI responses spoken aloud - all in one seamless experience.

Your application demonstrates mastery of OpenAIโ€™s entire ecosystem and provides a solid foundation for building even more advanced AI-powered applications. ๐Ÿ”Š