⚡ Performance Optimization Made Simple
Right now, you have content moderation and safety implementation working in your application, protecting users with comprehensive safety systems. But what if your AI could respond dramatically faster while using fewer resources?
Performance optimization transforms user experience. Instead of waiting seconds for AI responses, users get near-instant results through intelligent caching, optimized requests, and efficient resource management that can reduce latency by up to 80% and costs by up to 75%.
You’re about to learn exactly how to implement production-grade performance optimization in your existing application.
🧠 Step 1: Understanding Performance Optimization
Section titled “🧠 Step 1: Understanding Performance Optimization”Before we write any code, let’s understand what comprehensive performance optimization actually means and why it’s different from basic speed improvements.
What Performance Optimization Actually Means
Section titled “What Performance Optimization Actually Means”Performance optimization is like building a high-speed, efficient AI processing engine that delivers maximum speed with minimum resource usage. It goes beyond just making things faster to create intelligent systems that anticipate needs and eliminate waste.
Real-world analogy: Basic speed improvement is like driving faster on the same route. Performance optimization is like having GPS that finds the fastest route, a car that learns your patterns, and a system that predicts where you’re going before you ask.
Why Performance Optimization vs. Basic Speed
Section titled “Why Performance Optimization vs. Basic Speed”You already have a working application, but performance optimization is different:
🚀 Basic Speed - Making individual requests faster (incremental improvement)
⚡ Performance Optimization - Eliminating unnecessary work entirely (systematic improvement)
🎯 Intelligent Caching - Predicting and pre-computing responses (proactive optimization)
The key difference: Performance optimization prevents slow operations rather than just speeding them up.
Real-World Performance Impact
Section titled “Real-World Performance Impact”Think about how performance affects every aspect of your application:
- User experience - Near-instant responses vs. multi-second waits
- Cost efficiency - 75% fewer API calls through intelligent caching
- Scalability - Handle 10x more users with the same infrastructure
- Resource usage - Minimize CPU, memory, and network utilization
- Business value - Faster apps drive higher engagement and conversion
Without performance optimization:
- Every request hits the API (expensive and slow)
- Repeated work is done unnecessarily (wasteful)
- Users wait for identical computations (poor experience)
- Resources are consumed inefficiently (high costs)
With performance optimization, you have intelligent, predictive systems that deliver maximum speed at minimum cost.
Performance Optimization Components
Section titled “Performance Optimization Components”Your performance optimization will include multiple integrated systems:
🎯 Prompt Caching - The Speed Multiplier
- Best for: Eliminating repeated API calls for similar requests
- Strengths: 80% latency reduction, 75% cost savings, intelligent cache management
- Use cases: Chat conversations, repeated queries, similar content generation
📊 Request Optimization - The Efficiency Engine
- Best for: Maximizing the value of each API call
- Strengths: Batch processing, response compression, optimal model selection
- Use cases: Bulk operations, data processing, multi-step workflows
⚡ Intelligent Batching - The Throughput Booster
- Best for: Processing multiple requests efficiently
- Strengths: Reduced overhead, better resource utilization, queue management
- Use cases: Image processing, document analysis, bulk content generation
📈 Performance Analytics - The Optimization Intelligence
- Best for: Understanding and continuously improving performance
- Strengths: Real-time monitoring, bottleneck identification, trend analysis
- Use cases: Performance dashboards, optimization recommendations, capacity planning
🔧 Step 2: Building Performance Optimization Backend
Section titled “🔧 Step 2: Building Performance Optimization Backend”Let’s build a comprehensive performance optimization system on top of your existing backend. We’ll add intelligent caching, request optimization, and performance monitoring.
Building on your foundation: You already have a working backend with safety systems. We’re extending it to create high-performance, efficient AI processing with intelligent resource management.
Step 2A: Understanding Performance Architecture
Section titled “Step 2A: Understanding Performance Architecture”Before writing code, let’s understand how a performance-optimized architecture works:
// ⚡ PERFORMANCE OPTIMIZATION ARCHITECTURE:// 1. Prompt Caching - Store and reuse similar request results// 2. Request Batching - Process multiple requests efficiently// 3. Response Compression - Minimize data transfer overhead// 4. Model Selection - Choose optimal models for each task// 5. Performance Monitoring - Track and analyze speed metrics// 6. Predictive Prefetching - Anticipate and prepare responses
Key performance optimization concepts:
- Cache-First Strategy: Check cache before making API calls
- Intelligent Similarity: Recognize when requests can share responses
- Batch Processing: Group similar operations for efficiency
- Performance Budgets: Set and monitor speed targets
Step 2B: Installing Performance Dependencies
Section titled “Step 2B: Installing Performance Dependencies”Add performance optimization dependencies to your backend. In your backend folder, run:
npm install node-cache compression crypto lru-cache performance-now
What these packages do:
- node-cache: In-memory caching with TTL support
- compression: Response compression middleware
- crypto: Hash generation for cache keys
- lru-cache: Least-recently-used cache implementation
- performance-now: High-resolution performance timing
Step 2C: Adding Performance Optimization System
Section titled “Step 2C: Adding Performance Optimization System”Add these performance optimization components to your existing index.js
file, right after your safety implementation:
import NodeCache from 'node-cache';import compression from 'compression';import crypto from 'crypto';import LRU from 'lru-cache';import performanceNow from 'performance-now';
// ⚡ PERFORMANCE CONFIGURATION: System-wide performance settingsconst PERFORMANCE_CONFIG = { // Caching settings caching: { enabled: true, default_ttl: 3600, // 1 hour default cache TTL max_cache_size: 1000, // Maximum cached items similarity_threshold: 0.85, // Similarity threshold for cache hits cache_compression: true // Compress cached responses },
// Request optimization optimization: { batch_size: 10, // Maximum requests per batch batch_timeout: 100, // Batch wait time in milliseconds enable_compression: true, // Enable response compression min_compression_size: 1024 // Minimum size for compression },
// Performance monitoring monitoring: { track_performance: true, slow_request_threshold: 2000, // Requests slower than 2s performance_sampling: 0.1, // Sample 10% of requests retention_days: 7 // Keep performance data for 7 days }};
// 🎯 CACHING SYSTEM: Intelligent prompt and response cachingconst performanceCache = new NodeCache({ stdTTL: PERFORMANCE_CONFIG.caching.default_ttl, maxKeys: PERFORMANCE_CONFIG.caching.max_cache_size, useClones: false, checkperiod: 120});
const lruCache = new LRU({ max: PERFORMANCE_CONFIG.caching.max_cache_size, ttl: PERFORMANCE_CONFIG.caching.default_ttl * 1000, updateAgeOnGet: true});
// 📊 PERFORMANCE METRICS: Real-time performance trackingconst performanceMetrics = { requests: { total: 0, cached: 0, batched: 0, compressed: 0 }, timing: { average_response_time: 0, cache_hit_rate: 0, compression_ratio: 0, total_response_time: 0 }, optimization: { api_calls_saved: 0, bandwidth_saved: 0, cost_savings: 0 }, last_reset: new Date()};
// 🔧 PERFORMANCE HELPER FUNCTIONS
// Generate cache key from requestconst generateCacheKey = (endpoint, payload, options = {}) => { // Create a normalized version of the request for consistent caching const normalizedPayload = { ...payload, // Normalize common variations message: payload.message?.toLowerCase().trim(), prompt: payload.prompt?.toLowerCase().trim(), // Remove non-cacheable fields timestamp: undefined, user_id: undefined, session_id: undefined };
const cacheData = { endpoint, payload: normalizedPayload, model: options.model || 'default' };
return crypto .createHash('sha256') .update(JSON.stringify(cacheData)) .digest('hex') .substring(0, 32);};
// Calculate text similarity for cache matchingconst calculateSimilarity = (text1, text2) => { if (!text1 || !text2) return 0;
const normalize = (str) => str.toLowerCase().replace(/\s+/g, ' ').trim(); const norm1 = normalize(text1); const norm2 = normalize(text2);
if (norm1 === norm2) return 1;
// Simple character-based similarity const maxLength = Math.max(norm1.length, norm2.length); if (maxLength === 0) return 1;
let matches = 0; const minLength = Math.min(norm1.length, norm2.length);
for (let i = 0; i < minLength; i++) { if (norm1[i] === norm2[i]) matches++; }
return matches / maxLength;};
// Find similar cached responsesconst findSimilarCache = (endpoint, message, threshold = PERFORMANCE_CONFIG.caching.similarity_threshold) => { const allKeys = performanceCache.keys();
for (const key of allKeys) { const cached = performanceCache.get(key); if (!cached || !cached.request_info) continue;
if (cached.request_info.endpoint === endpoint) { const cachedMessage = cached.request_info.message || cached.request_info.prompt; if (cachedMessage) { const similarity = calculateSimilarity(message, cachedMessage); if (similarity >= threshold) { return { key, cached, similarity }; } } } }
return null;};
// Update performance metricsconst updatePerformanceMetrics = (type, value = 1, additionalData = {}) => { performanceMetrics.requests.total++;
switch (type) { case 'cache_hit': performanceMetrics.requests.cached++; performanceMetrics.optimization.api_calls_saved++; break; case 'batch': performanceMetrics.requests.batched += value; break; case 'compression': performanceMetrics.requests.compressed++; if (additionalData.original_size && additionalData.compressed_size) { const saved = additionalData.original_size - additionalData.compressed_size; performanceMetrics.optimization.bandwidth_saved += saved; } break; case 'timing': if (additionalData.response_time) { performanceMetrics.timing.total_response_time += additionalData.response_time; performanceMetrics.timing.average_response_time = performanceMetrics.timing.total_response_time / performanceMetrics.requests.total; } break; }
// Update derived metrics if (performanceMetrics.requests.total > 0) { performanceMetrics.timing.cache_hit_rate = (performanceMetrics.requests.cached / performanceMetrics.requests.total * 100).toFixed(2); }};
// 🎯 CACHING MIDDLEWARE: Intelligent response cachingconst cachingMiddleware = (cacheTTL = null, customKey = null) => { return async (req, res, next) => { if (!PERFORMANCE_CONFIG.caching.enabled) { return next(); }
const startTime = performanceNow();
try { // Generate cache key const cacheKey = customKey || generateCacheKey(req.path, req.body, { model: req.body.model, endpoint: req.path });
// Check exact cache match first let cached = performanceCache.get(cacheKey); let cacheSource = 'exact';
// If no exact match, try similarity matching if (!cached && (req.body.message || req.body.prompt)) { const similarMatch = findSimilarCache( req.path, req.body.message || req.body.prompt );
if (similarMatch) { cached = similarMatch.cached; cacheSource = 'similar'; console.log(`📊 Similar cache hit (${(similarMatch.similarity * 100).toFixed(1)}% match)`); } }
if (cached) { // Cache hit - return cached response const responseTime = performanceNow() - startTime; updatePerformanceMetrics('cache_hit'); updatePerformanceMetrics('timing', 1, { response_time: responseTime });
console.log(`⚡ Cache hit (${cacheSource}): ${req.path} (${responseTime.toFixed(2)}ms)`);
// Add cache headers res.setHeader('X-Cache', 'HIT'); res.setHeader('X-Cache-Source', cacheSource); res.setHeader('X-Response-Time', `${responseTime.toFixed(2)}ms`);
return res.json({ ...cached.response, cached: true, cache_source: cacheSource, performance: { response_time_ms: responseTime.toFixed(2), from_cache: true } }); }
// Cache miss - intercept response to cache it const originalSend = res.json; res.json = function(data) { const responseTime = performanceNow() - startTime;
// Cache successful responses if (res.statusCode === 200 && data.success !== false) { const cacheData = { response: data, request_info: { endpoint: req.path, message: req.body.message, prompt: req.body.prompt, model: req.body.model }, cached_at: new Date(), response_time: responseTime };
const ttl = cacheTTL || PERFORMANCE_CONFIG.caching.default_ttl; performanceCache.set(cacheKey, cacheData, ttl);
console.log(`💾 Response cached: ${req.path} (TTL: ${ttl}s)`); }
updatePerformanceMetrics('timing', 1, { response_time: responseTime });
// Add performance headers res.setHeader('X-Cache', 'MISS'); res.setHeader('X-Response-Time', `${responseTime.toFixed(2)}ms`);
// Add performance data to response data.performance = { response_time_ms: responseTime.toFixed(2), from_cache: false };
return originalSend.call(this, data); };
next();
} catch (error) { console.error('Caching middleware error:', error); next(); } };};
// 📦 COMPRESSION MIDDLEWARE: Response compressionapp.use(compression({ threshold: PERFORMANCE_CONFIG.optimization.min_compression_size, level: 6, filter: (req, res) => { if (req.headers['x-no-compression']) { return false; } return compression.filter(req, res); }}));
// 🚀 BATCH PROCESSING: Efficient bulk operationsconst batchQueue = new Map();const processBatch = async (endpoint, requests) => { console.log(`📦 Processing batch of ${requests.length} requests for ${endpoint}`);
const results = []; for (const { req, res, resolve } of requests) { try { // Process individual request (this would call your actual endpoint logic) const result = await processIndividualRequest(endpoint, req); results.push({ success: true, data: result }); resolve(result); } catch (error) { const errorResult = { success: false, error: error.message }; results.push(errorResult); resolve(errorResult); } }
updatePerformanceMetrics('batch', requests.length); return results;};
// Process individual request (helper for batching)const processIndividualRequest = async (endpoint, req) => { // This would contain the actual logic for each endpoint // For demo purposes, we'll simulate processing return new Promise(resolve => { setTimeout(() => { resolve({ message: "Batch processed request", endpoint, timestamp: new Date().toISOString() }); }, 100); });};
// ⚡ PERFORMANCE ENDPOINTS: Performance management and monitoring
// Apply caching to performance-critical routesapp.use('/api/chat', cachingMiddleware(1800)); // 30 minutes for chatapp.use('/api/images', cachingMiddleware(3600)); // 1 hour for imagesapp.use('/api/structured', cachingMiddleware(7200)); // 2 hours for structured output
// Performance dashboard endpointapp.get("/api/performance/dashboard", (req, res) => { try { const now = new Date(); const uptime = now - performanceMetrics.last_reset; const uptimeHours = uptime / (1000 * 60 * 60);
// Calculate additional metrics const cacheStats = { size: performanceCache.keys().length, hit_rate: performanceMetrics.timing.cache_hit_rate, max_size: PERFORMANCE_CONFIG.caching.max_cache_size, utilization: (performanceCache.keys().length / PERFORMANCE_CONFIG.caching.max_cache_size * 100).toFixed(1) };
const throughput = { requests_per_hour: uptimeHours > 0 ? Math.round(performanceMetrics.requests.total / uptimeHours) : 0, cached_per_hour: uptimeHours > 0 ? Math.round(performanceMetrics.requests.cached / uptimeHours) : 0, api_calls_saved_per_hour: uptimeHours > 0 ? Math.round(performanceMetrics.optimization.api_calls_saved / uptimeHours) : 0 };
const efficiency = { cache_efficiency: performanceMetrics.timing.cache_hit_rate, average_response_time: performanceMetrics.timing.average_response_time.toFixed(2), compression_ratio: performanceMetrics.requests.total > 0 ? (performanceMetrics.requests.compressed / performanceMetrics.requests.total * 100).toFixed(1) : 0 };
res.json({ success: true, metrics: performanceMetrics, cache_stats: cacheStats, throughput, efficiency, config: PERFORMANCE_CONFIG, uptime_hours: uptimeHours.toFixed(2), timestamp: now.toISOString() });
} catch (error) { console.error('Performance dashboard error:', error); res.status(500).json({ error: 'Failed to load performance dashboard', details: error.message, success: false }); }});
// Cache management endpointsapp.get("/api/performance/cache/stats", (req, res) => { try { const keys = performanceCache.keys(); const cacheData = keys.map(key => { const item = performanceCache.get(key); return { key: key.substring(0, 8) + '...', endpoint: item?.request_info?.endpoint, cached_at: item?.cached_at, response_time: item?.response_time?.toFixed(2) }; }).sort((a, b) => new Date(b.cached_at) - new Date(a.cached_at));
res.json({ success: true, total_items: keys.length, max_items: PERFORMANCE_CONFIG.caching.max_cache_size, cache_data: cacheData.slice(0, 50), // Return top 50 items memory_usage: process.memoryUsage() });
} catch (error) { res.status(500).json({ error: 'Failed to get cache stats', success: false }); }});
app.delete("/api/performance/cache/clear", (req, res) => { try { const keyCount = performanceCache.keys().length; performanceCache.flushAll(); lruCache.clear();
console.log(`🧹 Cache cleared: ${keyCount} items removed`);
res.json({ success: true, message: `Cache cleared successfully`, items_removed: keyCount });
} catch (error) { res.status(500).json({ error: 'Failed to clear cache', details: error.message, success: false }); }});
// Performance optimization suggestions endpointapp.get("/api/performance/suggestions", (req, res) => { try { const suggestions = [];
// Analyze cache hit rate const hitRate = parseFloat(performanceMetrics.timing.cache_hit_rate); if (hitRate < 50) { suggestions.push({ type: 'caching', priority: 'high', title: 'Low Cache Hit Rate', description: `Cache hit rate is ${hitRate}%. Consider increasing cache TTL or improving similarity thresholds.`, action: 'Adjust cache configuration' }); }
// Analyze response time const avgTime = performanceMetrics.timing.average_response_time; if (avgTime > 1000) { suggestions.push({ type: 'performance', priority: 'medium', title: 'Slow Response Times', description: `Average response time is ${avgTime.toFixed(0)}ms. Consider implementing request batching or model optimization.`, action: 'Optimize request processing' }); }
// Analyze compression usage const compressionRate = performanceMetrics.requests.total > 0 ? (performanceMetrics.requests.compressed / performanceMetrics.requests.total * 100) : 0; if (compressionRate < 30 && performanceMetrics.requests.total > 100) { suggestions.push({ type: 'bandwidth', priority: 'low', title: 'Low Compression Usage', description: `Only ${compressionRate.toFixed(1)}% of responses are compressed. Consider lowering compression threshold.`, action: 'Adjust compression settings' }); }
// Cache utilization const cacheUtilization = performanceCache.keys().length / PERFORMANCE_CONFIG.caching.max_cache_size * 100; if (cacheUtilization > 90) { suggestions.push({ type: 'caching', priority: 'medium', title: 'Cache Nearly Full', description: `Cache is ${cacheUtilization.toFixed(1)}% full. Consider increasing cache size or reducing TTL.`, action: 'Increase cache capacity' }); }
res.json({ success: true, suggestions, analysis_timestamp: new Date().toISOString() });
} catch (error) { res.status(500).json({ error: 'Failed to generate suggestions', success: false }); }});
// Performance test endpointapp.post("/api/performance/test", async (req, res) => { try { const { test_type = 'cache', iterations = 10 } = req.body; const results = [];
console.log(`🧪 Running performance test: ${test_type} (${iterations} iterations)`);
for (let i = 0; i < iterations; i++) { const startTime = performanceNow();
// Simulate different test types switch (test_type) { case 'cache': // Test cache performance const testKey = `test-${Date.now()}-${i}`; performanceCache.set(testKey, { data: `test-data-${i}` }); const retrieved = performanceCache.get(testKey); break;
case 'compression': // Test compression performance const largeData = 'x'.repeat(10000); const compressed = Buffer.from(largeData).toString('base64'); break;
default: // Default performance test await new Promise(resolve => setTimeout(resolve, 10)); }
const endTime = performanceNow(); results.push({ iteration: i + 1, time_ms: (endTime - startTime).toFixed(3) }); }
const avgTime = results.reduce((sum, r) => sum + parseFloat(r.time_ms), 0) / results.length; const minTime = Math.min(...results.map(r => parseFloat(r.time_ms))); const maxTime = Math.max(...results.map(r => parseFloat(r.time_ms)));
res.json({ success: true, test_type, iterations, results, summary: { average_time_ms: avgTime.toFixed(3), min_time_ms: minTime.toFixed(3), max_time_ms: maxTime.toFixed(3), total_time_ms: results.reduce((sum, r) => sum + parseFloat(r.time_ms), 0).toFixed(3) } });
} catch (error) { res.status(500).json({ error: 'Performance test failed', details: error.message, success: false }); }});
// Initialize performance systemconsole.log('⚡ Performance optimization system initialized');console.log(`📊 Cache: ${PERFORMANCE_CONFIG.caching.max_cache_size} items, ${PERFORMANCE_CONFIG.caching.default_ttl}s TTL`);console.log(`🚀 Compression: ${PERFORMANCE_CONFIG.optimization.enable_compression ? 'enabled' : 'disabled'}`);
Function breakdown:
- Intelligent caching - Store and reuse similar responses with similarity matching
- Response compression - Minimize bandwidth usage for large responses
- Performance monitoring - Track speed metrics and optimization opportunities
- Cache management - Automatic cleanup and intelligent cache utilization
- Performance analytics - Real-time insights and optimization suggestions
- Batch processing - Efficient handling of multiple similar requests
🔧 Step 3: Building the React Performance Dashboard Component
Section titled “🔧 Step 3: Building the React Performance Dashboard Component”Now let’s create a comprehensive performance monitoring interface that shows optimization metrics and cache performance.
Step 3A: Creating the Performance Dashboard Component
Section titled “Step 3A: Creating the Performance Dashboard Component”Create a new file src/PerformanceDashboard.jsx
:
import { useState, useEffect } from "react";import { Zap, TrendingUp, Database, Clock, BarChart3, Settings, RefreshCw, TestTube } from "lucide-react";
function PerformanceDashboard() { // 🧠 STATE: Performance dashboard data management const [performanceData, setPerformanceData] = useState(null); // Dashboard metrics const [cacheStats, setCacheStats] = useState(null); // Cache statistics const [suggestions, setSuggestions] = useState([]); // Optimization suggestions const [isLoading, setIsLoading] = useState(true); // Loading status const [error, setError] = useState(null); // Error messages const [activeTab, setActiveTab] = useState("overview"); // Active dashboard tab const [testResults, setTestResults] = useState(null); // Performance test results const [isRunningTest, setIsRunningTest] = useState(false); // Test execution status
// 🔧 FUNCTIONS: Performance dashboard logic engine
// Load performance dashboard data const loadPerformanceData = async () => { setIsLoading(true); setError(null);
try { const response = await fetch("http://localhost:8000/api/performance/dashboard"); const data = await response.json();
if (!response.ok) { throw new Error(data.error || 'Failed to load performance data'); }
setPerformanceData(data);
} catch (error) { console.error('Failed to load performance data:', error); setError(error.message || 'Could not load performance dashboard'); } finally { setIsLoading(false); } };
// Load cache statistics const loadCacheStats = async () => { try { const response = await fetch("http://localhost:8000/api/performance/cache/stats"); const data = await response.json();
if (response.ok) { setCacheStats(data); } } catch (error) { console.error('Failed to load cache stats:', error); } };
// Load optimization suggestions const loadSuggestions = async () => { try { const response = await fetch("http://localhost:8000/api/performance/suggestions"); const data = await response.json();
if (response.ok) { setSuggestions(data.suggestions || []); } } catch (error) { console.error('Failed to load suggestions:', error); } };
// Clear cache const clearCache = async () => { if (!confirm('Are you sure you want to clear the entire cache?')) { return; }
try { const response = await fetch("http://localhost:8000/api/performance/cache/clear", { method: "DELETE" });
const data = await response.json();
if (response.ok) { alert(`Cache cleared successfully! ${data.items_removed} items removed.`); loadPerformanceData(); loadCacheStats(); } else { throw new Error(data.error); } } catch (error) { console.error('Failed to clear cache:', error); setError(error.message || 'Could not clear cache'); } };
// Run performance test const runPerformanceTest = async (testType = 'cache', iterations = 100) => { setIsRunningTest(true); setTestResults(null); setError(null);
try { const response = await fetch("http://localhost:8000/api/performance/test", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ test_type: testType, iterations: iterations }) });
const data = await response.json();
if (!response.ok) { throw new Error(data.error || 'Performance test failed'); }
setTestResults(data);
} catch (error) { console.error('Performance test failed:', error); setError(error.message || 'Could not run performance test'); } finally { setIsRunningTest(false); } };
// Format bytes for display const formatBytes = (bytes, decimals = 2) => { if (bytes === 0) return '0 Bytes'; const k = 1024; const dm = decimals < 0 ? 0 : decimals; const sizes = ['Bytes', 'KB', 'MB', 'GB']; const i = Math.floor(Math.log(bytes) / Math.log(k)); return parseFloat((bytes / Math.pow(k, i)).toFixed(dm)) + ' ' + sizes[i]; };
// Get performance status color const getPerformanceColor = (value, thresholds) => { if (value >= thresholds.good) return 'text-green-600 bg-green-100'; if (value >= thresholds.ok) return 'text-yellow-600 bg-yellow-100'; return 'text-red-600 bg-red-100'; };
// Get suggestion priority color const getPriorityColor = (priority) => { switch (priority) { case 'high': return 'bg-red-500'; case 'medium': return 'bg-yellow-500'; case 'low': return 'bg-green-500'; default: return 'bg-gray-500'; } };
// Format timestamp for display const formatTimestamp = (timestamp) => { return new Date(timestamp).toLocaleString(); };
// Load data on component mount useEffect(() => { loadPerformanceData(); loadCacheStats(); loadSuggestions();
// Set up auto-refresh every 10 seconds const interval = setInterval(() => { loadPerformanceData(); loadCacheStats(); }, 10000);
return () => clearInterval(interval); }, []);
// 🎨 UI: Performance dashboard interface return ( <div className="min-h-screen bg-gradient-to-br from-blue-50 to-cyan-50 flex items-center justify-center p-4"> <div className="bg-white rounded-2xl shadow-2xl w-full max-w-7xl flex flex-col overflow-hidden">
{/* Header */} <div className="bg-gradient-to-r from-blue-600 to-cyan-600 text-white p-6"> <div className="flex items-center space-x-3"> <div className="w-10 h-10 bg-white bg-opacity-20 rounded-full flex items-center justify-center"> <Zap className="w-5 h-5" /> </div> <div> <h1 className="text-xl font-bold">⚡ Performance Optimization</h1> <p className="text-blue-100 text-sm">Maximize speed and efficiency with intelligent caching and optimization!</p> </div> </div> </div>
{/* Tab Navigation */} <div className="border-b border-gray-200"> <nav className="flex"> <button onClick={() => setActiveTab('overview')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'overview' ? 'border-blue-500 text-blue-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <TrendingUp className="w-4 h-4 inline mr-2" /> Overview </button> <button onClick={() => setActiveTab('cache')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'cache' ? 'border-blue-500 text-blue-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <Database className="w-4 h-4 inline mr-2" /> Cache Management </button> <button onClick={() => setActiveTab('suggestions')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'suggestions' ? 'border-blue-500 text-blue-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <BarChart3 className="w-4 h-4 inline mr-2" /> Optimization </button> <button onClick={() => setActiveTab('testing')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'testing' ? 'border-blue-500 text-blue-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <TestTube className="w-4 h-4 inline mr-2" /> Performance Testing </button> </nav> </div>
{/* Error Display */} {error && ( <div className="p-4 bg-red-50 border-b border-red-200"> <p className="text-red-700 text-sm"> <strong>Error:</strong> {error} </p> </div> )}
{/* Main Content */} <div className="flex-1 p-6"> {/* Overview Tab */} {activeTab === 'overview' && ( <div className="space-y-6"> {isLoading ? ( <div className="text-center py-12"> <div className="animate-spin w-8 h-8 border-4 border-blue-500 border-t-transparent rounded-full mx-auto mb-4"></div> <p className="text-gray-600">Loading performance metrics...</p> </div> ) : performanceData ? ( <> {/* Key Metrics Cards */} <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4"> <div className="bg-green-50 rounded-lg p-4"> <div className="flex items-center"> <TrendingUp className="w-8 h-8 text-green-600" /> <div className="ml-3"> <p className="text-sm font-medium text-green-600">Cache Hit Rate</p> <p className="text-2xl font-bold text-green-900"> {performanceData.efficiency.cache_efficiency}% </p> <p className="text-xs text-green-700"> {performanceData.metrics.requests.cached} hits </p> </div> </div> </div>
<div className="bg-blue-50 rounded-lg p-4"> <div className="flex items-center"> <Clock className="w-8 h-8 text-blue-600" /> <div className="ml-3"> <p className="text-sm font-medium text-blue-600">Avg Response Time</p> <p className="text-2xl font-bold text-blue-900"> {performanceData.efficiency.average_response_time}ms </p> <p className="text-xs text-blue-700"> {performanceData.metrics.requests.total} requests </p> </div> </div> </div>
<div className="bg-purple-50 rounded-lg p-4"> <div className="flex items-center"> <Zap className="w-8 h-8 text-purple-600" /> <div className="ml-3"> <p className="text-sm font-medium text-purple-600">API Calls Saved</p> <p className="text-2xl font-bold text-purple-900"> {performanceData.metrics.optimization.api_calls_saved.toLocaleString()} </p> <p className="text-xs text-purple-700"> {performanceData.throughput.api_calls_saved_per_hour}/hour </p> </div> </div> </div>
<div className="bg-orange-50 rounded-lg p-4"> <div className="flex items-center"> <Database className="w-8 h-8 text-orange-600" /> <div className="ml-3"> <p className="text-sm font-medium text-orange-600">Cache Utilization</p> <p className="text-2xl font-bold text-orange-900"> {performanceData.cache_stats.utilization}% </p> <p className="text-xs text-orange-700"> {performanceData.cache_stats.size}/{performanceData.cache_stats.max_size} items </p> </div> </div> </div> </div>
{/* Performance Charts/Stats */} <div className="grid grid-cols-1 lg:grid-cols-2 gap-6"> {/* Throughput Stats */} <div className="bg-white border rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4 flex items-center"> <TrendingUp className="w-5 h-5 mr-2 text-blue-600" /> Throughput Metrics </h3> <div className="space-y-4"> <div className="flex justify-between items-center"> <span className="text-gray-600">Requests per Hour</span> <span className="font-semibold">{performanceData.throughput.requests_per_hour}</span> </div> <div className="flex justify-between items-center"> <span className="text-gray-600">Cached per Hour</span> <span className="font-semibold text-green-600">{performanceData.throughput.cached_per_hour}</span> </div> <div className="flex justify-between items-center"> <span className="text-gray-600">Compression Rate</span> <span className="font-semibold">{performanceData.efficiency.compression_ratio}%</span> </div> <div className="flex justify-between items-center"> <span className="text-gray-600">Uptime</span> <span className="font-semibold">{performanceData.uptime_hours} hours</span> </div> </div> </div>
{/* System Configuration */} <div className="bg-white border rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4 flex items-center"> <Settings className="w-5 h-5 mr-2 text-blue-600" /> Configuration </h3> <div className="space-y-4"> <div className="flex justify-between items-center"> <span className="text-gray-600">Cache TTL</span> <span className="font-semibold">{performanceData.config.caching.default_ttl}s</span> </div> <div className="flex justify-between items-center"> <span className="text-gray-600">Max Cache Size</span> <span className="font-semibold">{performanceData.config.caching.max_cache_size}</span> </div> <div className="flex justify-between items-center"> <span className="text-gray-600">Similarity Threshold</span> <span className="font-semibold">{(performanceData.config.caching.similarity_threshold * 100)}%</span> </div> <div className="flex justify-between items-center"> <span className="text-gray-600">Compression</span> <span className={`font-semibold ${performanceData.config.optimization.enable_compression ? 'text-green-600' : 'text-red-600'}`}> {performanceData.config.optimization.enable_compression ? 'Enabled' : 'Disabled'} </span> </div> </div> </div> </div> </> ) : ( <div className="text-center py-12"> <Zap className="w-16 h-16 text-gray-400 mx-auto mb-4" /> <p className="text-gray-600">No performance data available</p> </div> )} </div> )}
{/* Cache Management Tab */} {activeTab === 'cache' && ( <div className="space-y-6"> <div className="flex justify-between items-center"> <h3 className="font-semibold text-gray-900">Cache Management</h3> <div className="space-x-2"> <button onClick={loadCacheStats} className="px-4 py-2 bg-blue-100 text-blue-700 rounded-lg hover:bg-blue-200 transition-colors duration-200" > <RefreshCw className="w-4 h-4 inline mr-2" /> Refresh </button> <button onClick={clearCache} className="px-4 py-2 bg-red-100 text-red-700 rounded-lg hover:bg-red-200 transition-colors duration-200" > Clear Cache </button> </div> </div>
{cacheStats && ( <> {/* Cache Overview */} <div className="bg-gray-50 rounded-lg p-6"> <div className="grid grid-cols-2 md:grid-cols-4 gap-4"> <div> <p className="text-sm text-gray-600">Total Items</p> <p className="text-2xl font-bold text-gray-900">{cacheStats.total_items}</p> </div> <div> <p className="text-sm text-gray-600">Max Items</p> <p className="text-2xl font-bold text-gray-900">{cacheStats.max_items}</p> </div> <div> <p className="text-sm text-gray-600">Memory Usage</p> <p className="text-lg font-bold text-gray-900"> {formatBytes(cacheStats.memory_usage.heapUsed)} </p> </div> <div> <p className="text-sm text-gray-600">Heap Total</p> <p className="text-lg font-bold text-gray-900"> {formatBytes(cacheStats.memory_usage.heapTotal)} </p> </div> </div> </div>
{/* Cache Items */} <div className="bg-white border rounded-lg p-6"> <h4 className="font-medium text-gray-900 mb-4">Recent Cache Items</h4>
{cacheStats.cache_data.length === 0 ? ( <p className="text-gray-500 text-center py-4">No cached items</p> ) : ( <div className="space-y-2 max-h-64 overflow-y-auto"> {cacheStats.cache_data.map((item, index) => ( <div key={index} className="flex items-center justify-between p-3 bg-gray-50 rounded-lg"> <div> <p className="font-medium text-gray-900">{item.endpoint || 'Unknown'}</p> <p className="text-sm text-gray-600">Key: {item.key}</p> </div> <div className="text-right"> <p className="text-sm text-gray-500"> {item.cached_at ? formatTimestamp(item.cached_at) : 'Unknown'} </p> {item.response_time && ( <p className="text-xs text-blue-600">{item.response_time}ms</p> )} </div> </div> ))} </div> )} </div> </> )} </div> )}
{/* Optimization Suggestions Tab */} {activeTab === 'suggestions' && ( <div className="space-y-6"> <div className="flex justify-between items-center"> <h3 className="font-semibold text-gray-900">Optimization Suggestions</h3> <button onClick={loadSuggestions} className="px-4 py-2 bg-blue-100 text-blue-700 rounded-lg hover:bg-blue-200 transition-colors duration-200" > <RefreshCw className="w-4 h-4 inline mr-2" /> Refresh </button> </div>
{suggestions.length === 0 ? ( <div className="text-center py-12"> <BarChart3 className="w-16 h-16 text-green-500 mx-auto mb-4" /> <h4 className="text-lg font-semibold text-gray-700 mb-2"> Great Performance! 🎉 </h4> <p className="text-gray-600"> No optimization suggestions at this time. Your system is running efficiently. </p> </div> ) : ( <div className="space-y-4"> {suggestions.map((suggestion, index) => ( <div key={index} className="bg-white border rounded-lg p-6"> <div className="flex items-start space-x-4"> <div className={`w-3 h-3 rounded-full mt-1 ${getPriorityColor(suggestion.priority)}`}></div> <div className="flex-1"> <div className="flex items-center justify-between mb-2"> <h4 className="font-medium text-gray-900">{suggestion.title}</h4> <span className={`px-2 py-1 rounded text-xs font-medium ${ suggestion.priority === 'high' ? 'bg-red-100 text-red-700' : suggestion.priority === 'medium' ? 'bg-yellow-100 text-yellow-700' : 'bg-green-100 text-green-700' }`}> {suggestion.priority.toUpperCase()} </span> </div> <p className="text-gray-600 mb-3">{suggestion.description}</p> <div className="flex items-center justify-between"> <span className="text-sm text-gray-500 capitalize"> Type: {suggestion.type} </span> <span className="text-sm font-medium text-blue-600"> {suggestion.action} </span> </div> </div> </div> </div> ))} </div> )} </div> )}
{/* Performance Testing Tab */} {activeTab === 'testing' && ( <div className="space-y-6"> <div className="bg-white border rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4">Performance Testing</h3>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4 mb-6"> <button onClick={() => runPerformanceTest('cache', 100)} disabled={isRunningTest} className="p-4 border-2 border-blue-200 rounded-lg hover:border-blue-400 hover:bg-blue-50 transition-colors duration-200 disabled:opacity-50" > <Database className="w-8 h-8 text-blue-600 mx-auto mb-2" /> <p className="font-medium text-gray-900">Cache Test</p> <p className="text-sm text-gray-600">Test cache read/write performance</p> </button>
<button onClick={() => runPerformanceTest('compression', 50)} disabled={isRunningTest} className="p-4 border-2 border-green-200 rounded-lg hover:border-green-400 hover:bg-green-50 transition-colors duration-200 disabled:opacity-50" > <Zap className="w-8 h-8 text-green-600 mx-auto mb-2" /> <p className="font-medium text-gray-900">Compression Test</p> <p className="text-sm text-gray-600">Test response compression efficiency</p> </button>
<button onClick={() => runPerformanceTest('general', 200)} disabled={isRunningTest} className="p-4 border-2 border-purple-200 rounded-lg hover:border-purple-400 hover:bg-purple-50 transition-colors duration-200 disabled:opacity-50" > <TestTube className="w-8 h-8 text-purple-600 mx-auto mb-2" /> <p className="font-medium text-gray-900">General Test</p> <p className="text-sm text-gray-600">Test overall system performance</p> </button> </div>
{isRunningTest && ( <div className="text-center py-8"> <div className="animate-spin w-8 h-8 border-4 border-blue-500 border-t-transparent rounded-full mx-auto mb-4"></div> <p className="text-gray-600">Running performance test...</p> </div> )}
{testResults && ( <div className="mt-6 p-4 bg-gray-50 rounded-lg"> <h4 className="font-medium text-gray-900 mb-4">Test Results</h4>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-4"> <div> <p className="text-sm text-gray-600">Test Type</p> <p className="font-semibold capitalize">{testResults.test_type}</p> </div> <div> <p className="text-sm text-gray-600">Iterations</p> <p className="font-semibold">{testResults.iterations}</p> </div> <div> <p className="text-sm text-gray-600">Average Time</p> <p className="font-semibold text-blue-600">{testResults.summary.average_time_ms}ms</p> </div> <div> <p className="text-sm text-gray-600">Total Time</p> <p className="font-semibold">{testResults.summary.total_time_ms}ms</p> </div> </div>
<div className="grid grid-cols-2 gap-4"> <div> <p className="text-sm text-gray-600 mb-1">Best Time</p> <p className="font-semibold text-green-600">{testResults.summary.min_time_ms}ms</p> </div> <div> <p className="text-sm text-gray-600 mb-1">Worst Time</p> <p className="font-semibold text-red-600">{testResults.summary.max_time_ms}ms</p> </div> </div> </div> )} </div> </div> )} </div>
{/* Footer */} <div className="p-4 border-t border-gray-200 bg-gray-50"> <div className="flex justify-between items-center text-sm text-gray-600"> <span>Last updated: {performanceData ? formatTimestamp(performanceData.timestamp) : 'Never'}</span> <button onClick={() => { loadPerformanceData(); loadCacheStats(); loadSuggestions(); }} disabled={isLoading} className="px-3 py-1 bg-blue-100 text-blue-700 rounded hover:bg-blue-200 disabled:opacity-50 transition-colors duration-200" > {isLoading ? 'Refreshing...' : 'Refresh All'} </button> </div> </div> </div> </div> );}
export default PerformanceDashboard;
Step 3B: Adding Performance Dashboard to Navigation
Section titled “Step 3B: Adding Performance Dashboard to Navigation”Update your src/App.jsx
to include the performance optimization component:
// Add to your existing importsimport PerformanceDashboard from "./PerformanceDashboard";import { MessageSquare, Image, Mic, Folder, Volume2, Eye, Phone, Link, FileText, Shield, Zap } from "lucide-react";
// Add performance button after your safety tab:<button onClick={() => setCurrentView("performance")} className={`px-3 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 whitespace-nowrap ${ currentView === "performance" ? "bg-blue-100 text-blue-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`}> <Zap className="w-4 h-4" /> <span>Performance</span></button>
// Add to your main content section:{currentView === "performance" && <PerformanceDashboard />}
🧪 Testing Your Performance Optimization
Section titled “🧪 Testing Your Performance Optimization”Let’s test your performance optimization system step by step.
Step 1: Backend Performance Test
Section titled “Step 1: Backend Performance Test”Test performance dashboard:
# Test the performance dashboard endpointcurl http://localhost:8000/api/performance/dashboard
Test cache functionality:
# Make a request that will be cachedcurl -X POST http://localhost:8000/api/chat \ -H "Content-Type: application/json" \ -d '{"message": "Hello, how are you?"}'
# Make the same request again - should be served from cachecurl -X POST http://localhost:8000/api/chat \ -H "Content-Type: application/json" \ -d '{"message": "Hello, how are you?"}'
Step 2: Performance Testing
Section titled “Step 2: Performance Testing”Start both servers and test the complete performance flow:
- Navigate to Performance → Click the “Performance” tab
- View performance metrics → Check response times and cache hit rates
- Monitor cache utilization → Watch cache statistics in real-time
- Run performance tests → Test cache, compression, and general performance
- Review optimization suggestions → Get recommendations for improvements
- Clear cache → Test cache clearing functionality
- Compare before/after → Measure performance improvements
Step 3: Performance Scenario Testing
Section titled “Step 3: Performance Scenario Testing”Test performance optimization scenarios:
⚡ Cache effectiveness: Make similar requests to test cache hits⚡ Response compression: Test large responses for compression⚡ Similarity matching: Try variations of the same prompt⚡ Performance monitoring: Watch real-time performance metrics
✅ What You Built
Section titled “✅ What You Built”Congratulations! You’ve implemented comprehensive performance optimization:
- ✅ Intelligent prompt caching with similarity matching and automatic cache management
- ✅ Response compression with configurable thresholds and bandwidth optimization
- ✅ Performance monitoring with real-time metrics and analytics
- ✅ Cache management with automatic cleanup and utilization tracking
- ✅ Optimization suggestions with automated performance analysis
- ✅ Performance testing with benchmarking tools and detailed reporting
Your Module 3 performance optimization includes:
- Content moderation - Detect harmful content
- Safety implementation - Comprehensive protection systems
- Performance optimization (new) - Maximize speed and efficiency
- Up to 80% latency reduction through intelligent caching
- Up to 75% cost savings through API call optimization
- Real-time performance monitoring with detailed analytics
Performance improvements achieved:
- Instant responses for cached requests
- Intelligent similarity matching for related queries
- Automated optimization suggestions for continuous improvement
- Professional performance dashboard for monitoring and management
- Comprehensive testing tools for performance validation
Next up: Cost management and monitoring to complete the production optimization suite for Module 3.
Your OpenAI application now delivers lightning-fast performance! ⚡
<function_calls>