🔒 Safety Implementation Made Simple
Right now, you have content moderation working in your application, automatically detecting harmful content in text and images. But what if you could build a comprehensive safety system that protects users proactively?
Safety implementation creates bulletproof AI applications. Instead of just detecting problems after they happen, you’ll build systems that prevent issues, monitor behavior in real-time, and automatically enforce safety policies across your entire application.
You’re about to learn exactly how to implement production-grade safety systems in your existing application.
🧠 Step 1: Understanding Safety Implementation
Section titled “🧠 Step 1: Understanding Safety Implementation”Before we write any code, let’s understand what comprehensive safety implementation actually means and why it’s different from basic content moderation.
What Safety Implementation Actually Means
Section titled “What Safety Implementation Actually Means”Safety implementation is like building a complete security and protection ecosystem for your AI application. It goes beyond detecting harmful content to create proactive safeguards, automated responses, and comprehensive monitoring systems.
Real-world analogy: Content moderation is like having security cameras that detect incidents. Safety implementation is like having cameras, alarm systems, automatic locks, security guards, emergency protocols, and prevention systems all working together.
Why Safety Implementation vs. Content Moderation
Section titled “Why Safety Implementation vs. Content Moderation”You already have content moderation working, but safety implementation is different:
🛡️ Content Moderation - Detects harmful content after creation (reactive)
🔒 Safety Implementation - Prevents, monitors, and responds to safety issues (proactive)
🏗️ Safety Architecture - Builds safety into every part of your application (systematic)
The key difference: Safety implementation creates a comprehensive protection system that prevents problems, not just detects them.
Real-World Use Cases
Section titled “Real-World Use Cases”Think about all the safety challenges AI applications face:
- User protection - Preventing harassment, abuse, and harmful interactions
- Content safety - Ensuring all generated content meets safety standards
- Behavioral monitoring - Detecting unusual patterns and potential misuse
- Policy enforcement - Automatically applying safety rules and consequences
- Incident response - Handling safety violations with appropriate actions
Without comprehensive safety implementation:
- Safety is an afterthought, not built-in (reactive approach)
- Inconsistent protection across different features (gaps in coverage)
- Manual response to safety issues (slow and unreliable)
- No prevention, only detection (problems still occur)
With safety implementation, you have proactive, automated, comprehensive protection built into every aspect of your application.
Safety System Components
Section titled “Safety System Components”Your safety implementation will include multiple integrated systems:
🛡️ Proactive Content Filtering - The Prevention Layer
- Best for: Stopping harmful content before it’s created or shown
- Strengths: Real-time filtering, user input validation, output screening
- Use cases: Chat filtering, image generation safety, user input validation
📊 Behavioral Monitoring - The Detection Layer
- Best for: Identifying patterns and unusual behavior over time
- Strengths: Pattern recognition, trend analysis, early warning systems
- Use cases: Abuse detection, spam prevention, account security
⚡ Automated Response - The Action Layer
- Best for: Taking immediate action when safety issues are detected
- Strengths: Instant response, consistent enforcement, escalation workflows
- Use cases: Content blocking, user warnings, account restrictions
📈 Safety Analytics - The Intelligence Layer
- Best for: Understanding safety trends and improving protection systems
- Strengths: Data analysis, reporting, continuous improvement
- Use cases: Safety dashboards, trend analysis, policy optimization
🔧 Step 2: Building Safety Implementation Backend
Section titled “🔧 Step 2: Building Safety Implementation Backend”Let’s build a comprehensive safety system on top of your existing backend. We’ll extend your content moderation with proactive safety features.
Building on your foundation: You already have content moderation working. We’re extending it to create a complete safety ecosystem with prevention, monitoring, and automated response capabilities.
Step 2A: Understanding Safety Implementation Architecture
Section titled “Step 2A: Understanding Safety Implementation Architecture”Before writing code, let’s understand how a safety-first architecture works:
// 🔒 SAFETY IMPLEMENTATION ARCHITECTURE:// 1. Input Filtering - Validate and filter all user inputs before processing// 2. Output Screening - Check all AI outputs before showing to users// 3. Behavioral Tracking - Monitor user actions and detect patterns// 4. Policy Engine - Automated rule enforcement and consequence application// 5. Safety Dashboard - Real-time monitoring and management interface// 6. Incident Response - Automated and manual response to safety violations
Key safety implementation concepts:
- Defense in Depth: Multiple layers of protection working together
- Proactive Prevention: Stopping problems before they occur
- Automated Enforcement: Consistent policy application without human intervention
- Continuous Monitoring: Real-time tracking of safety metrics and incidents
Step 2B: Installing Additional Safety Dependencies
Section titled “Step 2B: Installing Additional Safety Dependencies”Add safety monitoring dependencies to your backend. In your backend folder, run:
npm install rate-limiter-flexible winston express-rate-limit
What these packages do:
- rate-limiter-flexible: Advanced rate limiting and abuse prevention
- winston: Comprehensive logging for safety monitoring
- express-rate-limit: Basic rate limiting middleware
Step 2C: Adding Safety Implementation System
Section titled “Step 2C: Adding Safety Implementation System”Add these safety system components to your existing index.js
file, right after your content moderation routes:
import { RateLimiterMemory, RateLimiterRedis } from 'rate-limiter-flexible';import winston from 'winston';import rateLimit from 'express-rate-limit';
// 🔒 SAFETY CONFIGURATION: System-wide safety settingsconst SAFETY_CONFIG = { // Content filtering settings content_filtering: { enabled: true, block_threshold: 0.8, // Block content with >80% harmful probability warn_threshold: 0.5, // Warn for content with >50% harmful probability auto_escalate: true // Automatically escalate high-risk content },
// Rate limiting settings rate_limiting: { requests_per_minute: 60, // Max requests per user per minute requests_per_hour: 1000, // Max requests per user per hour burst_protection: true // Enable burst detection },
// Behavioral monitoring monitoring: { track_user_behavior: true, detection_window: 24, // Hours to analyze behavior patterns violation_threshold: 3, // Violations before escalation auto_restrictions: true // Automatic user restrictions },
// Automated responses responses: { content_blocking: true, // Automatically block harmful content user_warnings: true, // Send warnings for policy violations account_restrictions: true, // Restrict accounts for repeated violations admin_notifications: true // Notify admins of serious violations }};
// 🛡️ SAFETY LOGGER: Comprehensive safety event loggingconst safetyLogger = winston.createLogger({ level: 'info', format: winston.format.combine( winston.format.timestamp(), winston.format.errors({ stack: true }), winston.format.json() ), defaultMeta: { service: 'openai-safety-system' }, transports: [ new winston.transports.File({ filename: 'logs/safety-error.log', level: 'error', maxsize: 5242880, // 5MB maxFiles: 5 }), new winston.transports.File({ filename: 'logs/safety-combined.log', maxsize: 5242880, // 5MB maxFiles: 10 }), new winston.transports.Console({ format: winston.format.simple() }) ]});
// 🚨 RATE LIMITERS: Advanced abuse preventionconst createRateLimiters = () => { // Basic request rate limiter const requestLimiter = new RateLimiterMemory({ keyGenerator: (req) => req.ip + ':' + (req.user?.id || 'anonymous'), points: SAFETY_CONFIG.rate_limiting.requests_per_minute, duration: 60, // 1 minute execEvenly: true });
// Hourly request limiter const hourlyLimiter = new RateLimiterMemory({ keyGenerator: (req) => req.ip + ':' + (req.user?.id || 'anonymous'), points: SAFETY_CONFIG.rate_limiting.requests_per_hour, duration: 3600, // 1 hour execEvenly: true });
// Content generation limiter (stricter for AI features) const contentLimiter = new RateLimiterMemory({ keyGenerator: (req) => req.ip + ':' + (req.user?.id || 'anonymous'), points: 30, // 30 content generations per hour duration: 3600, execEvenly: true });
return { requestLimiter, hourlyLimiter, contentLimiter };};
const { requestLimiter, hourlyLimiter, contentLimiter } = createRateLimiters();
// 📊 SAFETY DATABASE: In-memory safety tracking (use real database in production)const safetyDatabase = { userViolations: new Map(), // Track user safety violations contentBlocked: new Map(), // Track blocked content patterns behaviorPatterns: new Map(), // Track user behavior patterns safetyMetrics: { totalRequests: 0, blockedRequests: 0, warningsIssued: 0, accountsRestricted: 0, lastReset: new Date() }};
// 🔧 SAFETY HELPER FUNCTIONS
// Record safety violationconst recordViolation = (userId, violationType, severity, details) => { const userKey = userId || 'anonymous';
if (!safetyDatabase.userViolations.has(userKey)) { safetyDatabase.userViolations.set(userKey, []); }
const violation = { type: violationType, severity, details, timestamp: new Date(), id: Date.now() + Math.random() };
safetyDatabase.userViolations.get(userKey).push(violation);
// Log violation safetyLogger.warn('Safety violation recorded', { userId: userKey, violation });
// Check if user needs restrictions if (SAFETY_CONFIG.responses.auto_restrictions) { checkUserRestrictions(userKey); }
return violation;};
// Check if user needs restrictions based on violation historyconst checkUserRestrictions = (userId) => { const violations = safetyDatabase.userViolations.get(userId) || []; const recentViolations = violations.filter(v => Date.now() - v.timestamp.getTime() < (SAFETY_CONFIG.monitoring.detection_window * 60 * 60 * 1000) );
if (recentViolations.length >= SAFETY_CONFIG.monitoring.violation_threshold) { // Apply restrictions restrictUser(userId, 'multiple_violations', recentViolations); }};
// Apply user restrictionsconst restrictUser = (userId, reason, violations) => { safetyLogger.error('User restricted', { userId, reason, violationCount: violations.length, violations: violations.map(v => ({ type: v.type, severity: v.severity })) });
safetyDatabase.safetyMetrics.accountsRestricted++;
// In a real application, you would update user permissions in your database console.log(`🚨 User ${userId} restricted for: ${reason}`);};
// Update safety metricsconst updateSafetyMetrics = (action) => { safetyDatabase.safetyMetrics.totalRequests++;
switch (action) { case 'blocked': safetyDatabase.safetyMetrics.blockedRequests++; break; case 'warning': safetyDatabase.safetyMetrics.warningsIssued++; break; }};
// 🛡️ SAFETY MIDDLEWARE: Proactive protection for all routes
// Rate limiting middlewareconst rateLimitMiddleware = async (req, res, next) => { try { await requestLimiter.consume(req.ip + ':' + (req.user?.id || 'anonymous')); await hourlyLimiter.consume(req.ip + ':' + (req.user?.id || 'anonymous')); next(); } catch (rateLimiterRes) { const msBeforeNext = rateLimiterRes.msBeforeNext || 1000;
safetyLogger.warn('Rate limit exceeded', { ip: req.ip, userId: req.user?.id, msBeforeNext });
updateSafetyMetrics('blocked');
res.status(429).json({ error: 'Too many requests', retryAfter: Math.round(msBeforeNext / 1000) || 1, success: false }); }};
// Content generation rate limitingconst contentRateLimitMiddleware = async (req, res, next) => { try { await contentLimiter.consume(req.ip + ':' + (req.user?.id || 'anonymous')); next(); } catch (rateLimiterRes) { const msBeforeNext = rateLimiterRes.msBeforeNext || 1000;
safetyLogger.warn('Content generation rate limit exceeded', { ip: req.ip, userId: req.user?.id, endpoint: req.path });
updateSafetyMetrics('blocked');
res.status(429).json({ error: 'Content generation rate limit exceeded', message: 'Please wait before generating more content', retryAfter: Math.round(msBeforeNext / 1000) || 1, success: false }); }};
// Input validation middlewareconst inputValidationMiddleware = async (req, res, next) => { try { updateSafetyMetrics('request');
// Check for suspicious patterns in request body const requestText = JSON.stringify(req.body).toLowerCase();
// Basic suspicious pattern detection const suspiciousPatterns = [ /(?:hack|exploit|vulnerability|injection|xss|csrf)/gi, /(?:admin|root|sudo|password|token|secret)/gi, /(?:delete|drop|truncate|alter)\s+(?:table|database|user)/gi ];
const hasSuspiciousContent = suspiciousPatterns.some(pattern => pattern.test(requestText) );
if (hasSuspiciousContent) { const violation = recordViolation( req.user?.id, 'suspicious_input', 'medium', { patterns: 'security_related', endpoint: req.path } );
safetyLogger.warn('Suspicious input detected', { ip: req.ip, userId: req.user?.id, endpoint: req.path, violation });
updateSafetyMetrics('warning'); }
next(); } catch (error) { safetyLogger.error('Input validation error', { error: error.message }); next(); }};
// Content safety wrapper functionconst withContentSafety = (originalFunction) => { return async (req, res) => { try { // Call original function but intercept the response const originalSend = res.send; let responseData = null;
res.send = function(data) { responseData = data; return originalSend.call(this, data); };
// Execute original function await originalFunction(req, res);
// Analyze response for safety if it was successful if (res.statusCode === 200 && responseData) { try { const parsedResponse = typeof responseData === 'string' ? JSON.parse(responseData) : responseData;
if (parsedResponse.response || parsedResponse.result) { const content = parsedResponse.response || parsedResponse.result; await analyzeOutputSafety(content, req.user?.id, req.path); } } catch (parseError) { // Response wasn't JSON, skip safety analysis } }
} catch (error) { safetyLogger.error('Content safety wrapper error', { error: error.message, endpoint: req.path });
res.status(500).json({ error: 'Safety system error', details: 'Content could not be processed safely', success: false }); } };};
// Analyze output safetyconst analyzeOutputSafety = async (content, userId, endpoint) => { try { // Use existing moderation endpoint for analysis const moderationResponse = await fetch('http://localhost:8000/api/moderation/text', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text: content }) });
const moderationResult = await moderationResponse.json();
if (moderationResult.flagged) { const severity = moderationResult.category_scores?.some(score => score.score > SAFETY_CONFIG.content_filtering.block_threshold ) ? 'high' : 'medium';
recordViolation( userId, 'harmful_output', severity, { categories: moderationResult.categories, endpoint, blocked: severity === 'high' } );
if (severity === 'high') { safetyLogger.error('High-risk content generated', { userId, endpoint, categories: moderationResult.categories }); } }
} catch (error) { safetyLogger.error('Output safety analysis failed', { error: error.message }); }};
// 🔒 SAFETY ENDPOINTS: Safety management and monitoring
// Apply safety middleware to content generation routesapp.use('/api/chat', rateLimitMiddleware);app.use('/api/images', contentRateLimitMiddleware);app.use('/api/voice', contentRateLimitMiddleware);app.use('/api/structured', contentRateLimitMiddleware);
// Apply input validation to all API routesapp.use('/api', inputValidationMiddleware);
// Safety dashboard endpointapp.get("/api/safety/dashboard", (req, res) => { try { // Calculate safety metrics const now = new Date(); const timeSinceReset = now - safetyDatabase.safetyMetrics.lastReset; const hoursActive = timeSinceReset / (1000 * 60 * 60);
const metrics = { current_metrics: safetyDatabase.safetyMetrics, rates: { requests_per_hour: hoursActive > 0 ? Math.round(safetyDatabase.safetyMetrics.totalRequests / hoursActive) : 0, block_rate: safetyDatabase.safetyMetrics.totalRequests > 0 ? (safetyDatabase.safetyMetrics.blockedRequests / safetyDatabase.safetyMetrics.totalRequests * 100).toFixed(2) : 0, warning_rate: safetyDatabase.safetyMetrics.totalRequests > 0 ? (safetyDatabase.safetyMetrics.warningsIssued / safetyDatabase.safetyMetrics.totalRequests * 100).toFixed(2) : 0 }, system_status: { content_filtering: SAFETY_CONFIG.content_filtering.enabled, rate_limiting: SAFETY_CONFIG.rate_limiting.requests_per_minute > 0, behavioral_monitoring: SAFETY_CONFIG.monitoring.track_user_behavior, automated_responses: SAFETY_CONFIG.responses.content_blocking }, recent_violations: Array.from(safetyDatabase.userViolations.entries()) .flatMap(([userId, violations]) => violations.slice(-5).map(v => ({ ...v, userId })) ) .sort((a, b) => b.timestamp - a.timestamp) .slice(0, 10) };
res.json({ success: true, ...metrics, timestamp: now.toISOString() });
} catch (error) { safetyLogger.error('Dashboard endpoint error', { error: error.message }); res.status(500).json({ error: 'Failed to load safety dashboard', details: error.message, success: false }); }});
// User safety status endpointapp.get("/api/safety/user/:userId", (req, res) => { try { const { userId } = req.params; const violations = safetyDatabase.userViolations.get(userId) || [];
const recentViolations = violations.filter(v => Date.now() - v.timestamp.getTime() < (24 * 60 * 60 * 1000) // Last 24 hours );
const riskLevel = recentViolations.length >= 3 ? 'high' : recentViolations.length >= 1 ? 'medium' : 'low';
res.json({ success: true, user_id: userId, risk_level: riskLevel, total_violations: violations.length, recent_violations: recentViolations.length, violations: violations.slice(-10), // Last 10 violations restrictions_active: riskLevel === 'high', last_violation: violations.length > 0 ? violations[violations.length - 1].timestamp : null });
} catch (error) { safetyLogger.error('User safety status error', { error: error.message }); res.status(500).json({ error: 'Failed to get user safety status', details: error.message, success: false }); }});
// Safety incident reporting endpointapp.post("/api/safety/report", (req, res) => { try { const { incident_type, description, content_id = null, user_id = null, severity = 'medium' } = req.body;
if (!incident_type || !description) { return res.status(400).json({ error: 'Incident type and description are required', success: false }); }
const incident = { id: Date.now() + Math.random(), type: incident_type, description, content_id, user_id, severity, reported_by: req.user?.id || 'anonymous', timestamp: new Date(), status: 'reported' };
// Record as violation if user_id provided if (user_id) { recordViolation(user_id, 'reported_incident', severity, { incident_type, description, reported_by: req.user?.id }); }
safetyLogger.warn('Safety incident reported', { incident });
res.json({ success: true, incident_id: incident.id, status: 'reported', message: 'Incident has been recorded and will be reviewed' });
} catch (error) { safetyLogger.error('Incident reporting error', { error: error.message }); res.status(500).json({ error: 'Failed to report incident', details: error.message, success: false }); }});
// Safety configuration endpointapp.get("/api/safety/config", (req, res) => { try { res.json({ success: true, config: SAFETY_CONFIG, timestamp: new Date().toISOString() }); } catch (error) { res.status(500).json({ error: 'Failed to get safety configuration', success: false }); }});
// Update safety configuration endpoint (admin only)app.post("/api/safety/config", (req, res) => { try { const { config } = req.body;
if (!config) { return res.status(400).json({ error: 'Configuration object is required', success: false }); }
// Merge with existing config (in production, validate admin permissions) Object.assign(SAFETY_CONFIG, config);
safetyLogger.info('Safety configuration updated', { newConfig: config, updatedBy: req.user?.id || 'anonymous' });
res.json({ success: true, message: 'Safety configuration updated', current_config: SAFETY_CONFIG });
} catch (error) { safetyLogger.error('Configuration update error', { error: error.message }); res.status(500).json({ error: 'Failed to update safety configuration', details: error.message, success: false }); }});
// Initialize safety systemconsole.log('🔒 Safety implementation system initialized');safetyLogger.info('Safety system started', { config: SAFETY_CONFIG });
Function breakdown:
- Proactive filtering - Prevent harmful content before it’s processed
- Rate limiting - Protect against abuse and overuse
- Behavioral monitoring - Track patterns and detect violations
- Automated responses - Take action when safety issues are detected
- Comprehensive logging - Record all safety events for analysis
- Safety dashboard - Monitor system health and safety metrics
🔧 Step 3: Building the React Safety Dashboard Component
Section titled “🔧 Step 3: Building the React Safety Dashboard Component”Now let’s create a comprehensive safety management interface that integrates with your existing application.
Step 3A: Creating the Safety Dashboard Component
Section titled “Step 3A: Creating the Safety Dashboard Component”Create a new file src/SafetyDashboard.jsx
:
import { useState, useEffect } from "react";import { Shield, AlertTriangle, Users, Activity, Settings, TrendingUp, Eye, Ban } from "lucide-react";
function SafetyDashboard() { // 🧠 STATE: Safety dashboard data management const [safetyMetrics, setSafetyMetrics] = useState(null); // Dashboard metrics const [isLoading, setIsLoading] = useState(true); // Loading status const [error, setError] = useState(null); // Error messages const [selectedUser, setSelectedUser] = useState(""); // User lookup const [userStatus, setUserStatus] = useState(null); // User safety status const [incidentReport, setIncidentReport] = useState({ // Incident reporting type: "", description: "", userId: "", severity: "medium" }); const [config, setConfig] = useState(null); // Safety configuration const [activeTab, setActiveTab] = useState("overview"); // Active dashboard tab
// 🔧 FUNCTIONS: Safety dashboard logic engine
// Load safety dashboard data const loadSafetyMetrics = async () => { setIsLoading(true); setError(null);
try { const response = await fetch("http://localhost:8000/api/safety/dashboard"); const data = await response.json();
if (!response.ok) { throw new Error(data.error || 'Failed to load safety metrics'); }
setSafetyMetrics(data);
} catch (error) { console.error('Failed to load safety metrics:', error); setError(error.message || 'Could not load safety dashboard'); } finally { setIsLoading(false); } };
// Load safety configuration const loadSafetyConfig = async () => { try { const response = await fetch("http://localhost:8000/api/safety/config"); const data = await response.json();
if (response.ok) { setConfig(data.config); } } catch (error) { console.error('Failed to load safety config:', error); } };
// Look up user safety status const lookupUserStatus = async () => { if (!selectedUser.trim()) { setError('User ID is required'); return; }
setError(null);
try { const response = await fetch(`http://localhost:8000/api/safety/user/${selectedUser.trim()}`); const data = await response.json();
if (!response.ok) { throw new Error(data.error || 'Failed to get user status'); }
setUserStatus(data);
} catch (error) { console.error('User lookup failed:', error); setError(error.message || 'Could not look up user status'); } };
// Report safety incident const reportIncident = async () => { if (!incidentReport.type || !incidentReport.description) { setError('Incident type and description are required'); return; }
setError(null);
try { const response = await fetch("http://localhost:8000/api/safety/report", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ incident_type: incidentReport.type, description: incidentReport.description, user_id: incidentReport.userId || null, severity: incidentReport.severity }) });
const data = await response.json();
if (!response.ok) { throw new Error(data.error || 'Failed to report incident'); }
// Reset form setIncidentReport({ type: "", description: "", userId: "", severity: "medium" });
// Reload metrics to show new incident loadSafetyMetrics();
alert('Incident reported successfully');
} catch (error) { console.error('Incident reporting failed:', error); setError(error.message || 'Could not report incident'); } };
// Format timestamp for display const formatTimestamp = (timestamp) => { return new Date(timestamp).toLocaleString(); };
// Get risk level color const getRiskColor = (level) => { switch (level) { case 'high': return 'text-red-600 bg-red-100'; case 'medium': return 'text-yellow-600 bg-yellow-100'; case 'low': return 'text-green-600 bg-green-100'; default: return 'text-gray-600 bg-gray-100'; } };
// Get severity color const getSeverityColor = (severity) => { switch (severity) { case 'high': return 'bg-red-500'; case 'medium': return 'bg-yellow-500'; case 'low': return 'bg-green-500'; default: return 'bg-gray-500'; } };
// Load data on component mount useEffect(() => { loadSafetyMetrics(); loadSafetyConfig();
// Set up auto-refresh every 30 seconds const interval = setInterval(loadSafetyMetrics, 30000); return () => clearInterval(interval); }, []);
// 🎨 UI: Safety dashboard interface return ( <div className="min-h-screen bg-gradient-to-br from-red-50 to-orange-50 flex items-center justify-center p-4"> <div className="bg-white rounded-2xl shadow-2xl w-full max-w-7xl flex flex-col overflow-hidden">
{/* Header */} <div className="bg-gradient-to-r from-red-600 to-orange-600 text-white p-6"> <div className="flex items-center space-x-3"> <div className="w-10 h-10 bg-white bg-opacity-20 rounded-full flex items-center justify-center"> <Shield className="w-5 h-5" /> </div> <div> <h1 className="text-xl font-bold">🔒 Safety Implementation</h1> <p className="text-red-100 text-sm">Comprehensive safety monitoring and management system!</p> </div> </div> </div>
{/* Tab Navigation */} <div className="border-b border-gray-200"> <nav className="flex"> <button onClick={() => setActiveTab('overview')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'overview' ? 'border-red-500 text-red-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <Activity className="w-4 h-4 inline mr-2" /> Overview </button> <button onClick={() => setActiveTab('users')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'users' ? 'border-red-500 text-red-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <Users className="w-4 h-4 inline mr-2" /> User Safety </button> <button onClick={() => setActiveTab('incidents')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'incidents' ? 'border-red-500 text-red-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <AlertTriangle className="w-4 h-4 inline mr-2" /> Incidents </button> <button onClick={() => setActiveTab('config')} className={`px-6 py-3 font-medium text-sm border-b-2 transition-colors duration-200 ${ activeTab === 'config' ? 'border-red-500 text-red-600' : 'border-transparent text-gray-500 hover:text-gray-700' }`} > <Settings className="w-4 h-4 inline mr-2" /> Configuration </button> </nav> </div>
{/* Error Display */} {error && ( <div className="p-4 bg-red-50 border-b border-red-200"> <p className="text-red-700 text-sm"> <strong>Error:</strong> {error} </p> </div> )}
{/* Main Content */} <div className="flex-1 p-6"> {/* Overview Tab */} {activeTab === 'overview' && ( <div className="space-y-6"> {isLoading ? ( <div className="text-center py-12"> <div className="animate-spin w-8 h-8 border-4 border-red-500 border-t-transparent rounded-full mx-auto mb-4"></div> <p className="text-gray-600">Loading safety metrics...</p> </div> ) : safetyMetrics ? ( <> {/* Metrics Cards */} <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4"> <div className="bg-blue-50 rounded-lg p-4"> <div className="flex items-center"> <TrendingUp className="w-8 h-8 text-blue-600" /> <div className="ml-3"> <p className="text-sm font-medium text-blue-600">Total Requests</p> <p className="text-2xl font-bold text-blue-900"> {safetyMetrics.current_metrics.totalRequests.toLocaleString()} </p> </div> </div> </div>
<div className="bg-red-50 rounded-lg p-4"> <div className="flex items-center"> <Ban className="w-8 h-8 text-red-600" /> <div className="ml-3"> <p className="text-sm font-medium text-red-600">Blocked Requests</p> <p className="text-2xl font-bold text-red-900"> {safetyMetrics.current_metrics.blockedRequests.toLocaleString()} </p> <p className="text-xs text-red-700"> {safetyMetrics.rates.block_rate}% of total </p> </div> </div> </div>
<div className="bg-yellow-50 rounded-lg p-4"> <div className="flex items-center"> <AlertTriangle className="w-8 h-8 text-yellow-600" /> <div className="ml-3"> <p className="text-sm font-medium text-yellow-600">Warnings Issued</p> <p className="text-2xl font-bold text-yellow-900"> {safetyMetrics.current_metrics.warningsIssued.toLocaleString()} </p> <p className="text-xs text-yellow-700"> {safetyMetrics.rates.warning_rate}% of total </p> </div> </div> </div>
<div className="bg-purple-50 rounded-lg p-4"> <div className="flex items-center"> <Users className="w-8 h-8 text-purple-600" /> <div className="ml-3"> <p className="text-sm font-medium text-purple-600">Accounts Restricted</p> <p className="text-2xl font-bold text-purple-900"> {safetyMetrics.current_metrics.accountsRestricted.toLocaleString()} </p> </div> </div> </div> </div>
{/* System Status */} <div className="bg-gray-50 rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4 flex items-center"> <Shield className="w-5 h-5 mr-2 text-red-600" /> System Status </h3> <div className="grid grid-cols-2 md:grid-cols-4 gap-4"> {Object.entries(safetyMetrics.system_status).map(([key, status]) => ( <div key={key} className="flex items-center space-x-2"> <div className={`w-3 h-3 rounded-full ${status ? 'bg-green-500' : 'bg-red-500'}`}></div> <span className="text-sm text-gray-700 capitalize"> {key.replace(/_/g, ' ')} </span> </div> ))} </div> </div>
{/* Recent Violations */} <div className="bg-white border rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4 flex items-center"> <Eye className="w-5 h-5 mr-2 text-red-600" /> Recent Violations ({safetyMetrics.recent_violations.length}) </h3>
{safetyMetrics.recent_violations.length === 0 ? ( <p className="text-gray-500 text-center py-4">No recent violations</p> ) : ( <div className="space-y-3"> {safetyMetrics.recent_violations.map((violation, index) => ( <div key={violation.id || index} className="flex items-center justify-between p-3 bg-gray-50 rounded-lg"> <div className="flex items-center space-x-3"> <div className={`w-3 h-3 rounded-full ${getSeverityColor(violation.severity)}`}></div> <div> <p className="font-medium text-gray-900">{violation.type.replace(/_/g, ' ')}</p> <p className="text-sm text-gray-600">User: {violation.userId}</p> </div> </div> <div className="text-right"> <p className="text-sm text-gray-500">{formatTimestamp(violation.timestamp)}</p> <span className={`px-2 py-1 rounded text-xs font-medium ${ violation.severity === 'high' ? 'bg-red-100 text-red-700' : violation.severity === 'medium' ? 'bg-yellow-100 text-yellow-700' : 'bg-green-100 text-green-700' }`}> {violation.severity} </span> </div> </div> ))} </div> )} </div> </> ) : ( <div className="text-center py-12"> <Shield className="w-16 h-16 text-gray-400 mx-auto mb-4" /> <p className="text-gray-600">No safety metrics available</p> </div> )} </div> )}
{/* User Safety Tab */} {activeTab === 'users' && ( <div className="space-y-6"> <div className="bg-white border rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4">User Safety Lookup</h3>
<div className="flex space-x-3 mb-4"> <input type="text" value={selectedUser} onChange={(e) => setSelectedUser(e.target.value)} placeholder="Enter user ID..." className="flex-1 px-4 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-red-500" /> <button onClick={lookupUserStatus} disabled={!selectedUser.trim()} className="px-6 py-2 bg-red-600 text-white rounded-lg hover:bg-red-700 disabled:opacity-50 transition-colors duration-200" > Lookup </button> </div>
{userStatus && ( <div className="mt-6 p-4 bg-gray-50 rounded-lg"> <div className="flex items-center justify-between mb-4"> <h4 className="font-medium text-gray-900">User: {userStatus.user_id}</h4> <span className={`px-3 py-1 rounded-lg text-sm font-medium ${getRiskColor(userStatus.risk_level)}`}> {userStatus.risk_level.toUpperCase()} RISK </span> </div>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-4"> <div> <p className="text-sm text-gray-600">Total Violations</p> <p className="text-lg font-semibold">{userStatus.total_violations}</p> </div> <div> <p className="text-sm text-gray-600">Recent Violations</p> <p className="text-lg font-semibold">{userStatus.recent_violations}</p> </div> <div> <p className="text-sm text-gray-600">Restrictions</p> <p className={`text-lg font-semibold ${userStatus.restrictions_active ? 'text-red-600' : 'text-green-600'}`}> {userStatus.restrictions_active ? 'Active' : 'None'} </p> </div> <div> <p className="text-sm text-gray-600">Last Violation</p> <p className="text-sm text-gray-700"> {userStatus.last_violation ? formatTimestamp(userStatus.last_violation) : 'None'} </p> </div> </div>
{userStatus.violations.length > 0 && ( <div> <h5 className="font-medium text-gray-900 mb-2">Recent Violations</h5> <div className="space-y-2 max-h-40 overflow-y-auto"> {userStatus.violations.map((violation, index) => ( <div key={violation.id || index} className="flex items-center justify-between p-2 bg-white rounded"> <div> <p className="text-sm font-medium">{violation.type.replace(/_/g, ' ')}</p> <p className="text-xs text-gray-600">{formatTimestamp(violation.timestamp)}</p> </div> <span className={`px-2 py-1 rounded text-xs ${ violation.severity === 'high' ? 'bg-red-100 text-red-700' : violation.severity === 'medium' ? 'bg-yellow-100 text-yellow-700' : 'bg-green-100 text-green-700' }`}> {violation.severity} </span> </div> ))} </div> </div> )} </div> )} </div> </div> )}
{/* Incidents Tab */} {activeTab === 'incidents' && ( <div className="space-y-6"> <div className="bg-white border rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4">Report Safety Incident</h3>
<div className="space-y-4"> <div className="grid grid-cols-1 md:grid-cols-2 gap-4"> <div> <label className="block text-sm font-medium text-gray-700 mb-2"> Incident Type </label> <select value={incidentReport.type} onChange={(e) => setIncidentReport({...incidentReport, type: e.target.value})} className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-red-500" > <option value="">Select incident type...</option> <option value="harassment">Harassment</option> <option value="harmful_content">Harmful Content</option> <option value="spam">Spam</option> <option value="abuse">System Abuse</option> <option value="security">Security Issue</option> <option value="other">Other</option> </select> </div>
<div> <label className="block text-sm font-medium text-gray-700 mb-2"> Severity </label> <select value={incidentReport.severity} onChange={(e) => setIncidentReport({...incidentReport, severity: e.target.value})} className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-red-500" > <option value="low">Low</option> <option value="medium">Medium</option> <option value="high">High</option> </select> </div> </div>
<div> <label className="block text-sm font-medium text-gray-700 mb-2"> User ID (Optional) </label> <input type="text" value={incidentReport.userId} onChange={(e) => setIncidentReport({...incidentReport, userId: e.target.value})} placeholder="User ID if applicable..." className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-red-500" /> </div>
<div> <label className="block text-sm font-medium text-gray-700 mb-2"> Description </label> <textarea value={incidentReport.description} onChange={(e) => setIncidentReport({...incidentReport, description: e.target.value})} placeholder="Describe the safety incident..." rows="4" className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-red-500" /> </div>
<button onClick={reportIncident} disabled={!incidentReport.type || !incidentReport.description} className="px-6 py-2 bg-red-600 text-white rounded-lg hover:bg-red-700 disabled:opacity-50 transition-colors duration-200" > Report Incident </button> </div> </div> </div> )}
{/* Configuration Tab */} {activeTab === 'config' && ( <div className="space-y-6"> <div className="bg-white border rounded-lg p-6"> <h3 className="font-semibold text-gray-900 mb-4">Safety Configuration</h3>
{config ? ( <div className="space-y-6"> <div> <h4 className="font-medium text-gray-900 mb-3">Content Filtering</h4> <div className="grid grid-cols-1 md:grid-cols-3 gap-4 text-sm"> <div> <span className="text-gray-600">Enabled:</span> <span className={`ml-2 ${config.content_filtering.enabled ? 'text-green-600' : 'text-red-600'}`}> {config.content_filtering.enabled ? 'Yes' : 'No'} </span> </div> <div> <span className="text-gray-600">Block Threshold:</span> <span className="ml-2 text-gray-900">{config.content_filtering.block_threshold}</span> </div> <div> <span className="text-gray-600">Warn Threshold:</span> <span className="ml-2 text-gray-900">{config.content_filtering.warn_threshold}</span> </div> </div> </div>
<div> <h4 className="font-medium text-gray-900 mb-3">Rate Limiting</h4> <div className="grid grid-cols-1 md:grid-cols-2 gap-4 text-sm"> <div> <span className="text-gray-600">Requests/Minute:</span> <span className="ml-2 text-gray-900">{config.rate_limiting.requests_per_minute}</span> </div> <div> <span className="text-gray-600">Requests/Hour:</span> <span className="ml-2 text-gray-900">{config.rate_limiting.requests_per_hour}</span> </div> </div> </div>
<div> <h4 className="font-medium text-gray-900 mb-3">Monitoring</h4> <div className="grid grid-cols-1 md:grid-cols-3 gap-4 text-sm"> <div> <span className="text-gray-600">Behavior Tracking:</span> <span className={`ml-2 ${config.monitoring.track_user_behavior ? 'text-green-600' : 'text-red-600'}`}> {config.monitoring.track_user_behavior ? 'Enabled' : 'Disabled'} </span> </div> <div> <span className="text-gray-600">Detection Window:</span> <span className="ml-2 text-gray-900">{config.monitoring.detection_window} hours</span> </div> <div> <span className="text-gray-600">Violation Threshold:</span> <span className="ml-2 text-gray-900">{config.monitoring.violation_threshold}</span> </div> </div> </div>
<div> <h4 className="font-medium text-gray-900 mb-3">Automated Responses</h4> <div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm"> {Object.entries(config.responses).map(([key, enabled]) => ( <div key={key}> <span className="text-gray-600 capitalize">{key.replace(/_/g, ' ')}:</span> <span className={`ml-2 ${enabled ? 'text-green-600' : 'text-red-600'}`}> {enabled ? 'On' : 'Off'} </span> </div> ))} </div> </div> </div> ) : ( <p className="text-gray-500">Loading configuration...</p> )} </div> </div> )} </div>
{/* Refresh Button */} <div className="p-4 border-t border-gray-200"> <button onClick={loadSafetyMetrics} disabled={isLoading} className="px-4 py-2 bg-red-100 text-red-700 rounded-lg hover:bg-red-200 disabled:opacity-50 transition-colors duration-200" > {isLoading ? 'Refreshing...' : 'Refresh Data'} </button> </div> </div> </div> );}
export default SafetyDashboard;
Step 3B: Adding Safety Dashboard to Navigation
Section titled “Step 3B: Adding Safety Dashboard to Navigation”Update your src/App.jsx
to include the safety implementation tab in Module 3:
// Add to your existing importsimport SafetyDashboard from "./SafetyDashboard";import { MessageSquare, Image, Mic, Folder, Volume2, Eye, Phone, Link, FileText, Shield } from "lucide-react";
// Update your currentView state to include 'safety'const [currentView, setCurrentView] = useState("chat"); // Add 'safety' to options
// Add Safety tab to Module 3 section (you can group it separately or add a Module 3 section)// Add this button after your existing tabs:<button onClick={() => setCurrentView("safety")} className={`px-3 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 whitespace-nowrap ${ currentView === "safety" ? "bg-red-100 text-red-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`}> <Shield className="w-4 h-4" /> <span>Safety</span></button>
// Add to your main content section:{currentView === "safety" && <SafetyDashboard />}
🧪 Testing Your Safety Implementation
Section titled “🧪 Testing Your Safety Implementation”Let’s test your comprehensive safety system step by step.
Step 1: Backend Safety Test
Section titled “Step 1: Backend Safety Test”Test safety dashboard:
# Test the safety dashboard endpointcurl http://localhost:8000/api/safety/dashboard
Test user lookup:
# Test user safety statuscurl http://localhost:8000/api/safety/user/test-user
Step 2: Full Application Test
Section titled “Step 2: Full Application Test”Start both servers and test the complete safety flow:
- Navigate to Safety → Click the “Safety” tab
- View safety metrics → Check system overview and status
- Look up user safety → Search for user safety status
- Report incidents → Test incident reporting system
- Monitor violations → Watch real-time safety violations
- Test rate limiting → Make rapid requests to trigger limits
- Review configuration → Check safety system settings
Step 3: Safety Scenario Testing
Section titled “Step 3: Safety Scenario Testing”Test real safety scenarios:
🔴 Rate limiting: Make multiple rapid API calls🔴 Suspicious input: Try potentially harmful content🔴 Violation patterns: Simulate repeated policy violations🔴 Incident reporting: Report various types of safety issues
✅ What You Built
Section titled “✅ What You Built”Congratulations! You’ve implemented a comprehensive safety system:
- ✅ Proactive content filtering with real-time input/output screening
- ✅ Advanced rate limiting with abuse prevention and burst protection
- ✅ Behavioral monitoring with pattern detection and violation tracking
- ✅ Automated responses with policy enforcement and user restrictions
- ✅ Safety dashboard with real-time monitoring and incident management
- ✅ Comprehensive logging with detailed safety event tracking
Your Module 3 safety implementation includes:
- Content moderation (existing) - Detect harmful content
- Safety implementation (new) - Prevent, monitor, and respond to safety issues
- Integrated protection across all application features
- Real-time monitoring with automated responses
- Professional safety dashboard for system management
Next up: Performance optimization, cost management, and production deployment strategies to complete Module 3.
Your OpenAI application now has enterprise-grade safety protection! 🔒