👁️ AI Vision Analysis Made Simple
Right now, you have chat, images, audio, files, and speech working in your application. But what if your AI could also see and understand images?
Vision analysis opens up visual intelligence. Instead of manually reviewing screenshots, documents, or charts, users can upload any image and get instant AI-powered insights, data extraction, and intelligent visual analysis.
You’re about to learn exactly how to add intelligent vision processing to your existing application.
🧠 Step 1: Understanding AI Vision Analysis
Section titled “🧠 Step 1: Understanding AI Vision Analysis”Before we write any code, let’s understand what AI vision analysis actually means and why it’s useful for your applications.
What AI Vision Analysis Actually Means
Section titled “What AI Vision Analysis Actually Means”AI vision analysis is like having a professional visual analyst inside your application. Users upload any image - screenshots, documents, charts, photos - and the AI reads, understands, and extracts meaningful insights automatically.
Real-world analogy: It’s like hiring a team of specialists who can instantly look at any visual content and give you detailed analysis, extract key data, and provide actionable insights. Instead of spending time manually reviewing images, you upload them and get professional analysis in seconds.
Why You Need This in Your Applications
Section titled “Why You Need This in Your Applications”Think about all the times you or your users need to analyze visual content:
- Business documents need OCR and data extraction
- Charts and graphs need data interpretation and trend analysis
- Screenshots need UI analysis and improvement suggestions
- Photos need object recognition and content analysis
- Dashboards need KPI extraction and performance insights
Without AI vision analysis, you’d need to:
- Manually examine every image (time-consuming)
- Extract data points by hand (error-prone)
- Miss important visual patterns (limiting)
- Process one image at a time (inefficient)
With AI vision analysis, you just upload any image and get intelligent insights instantly.
Vision Analysis Types Your AI Can Handle
Section titled “Vision Analysis Types Your AI Can Handle”Your vision analyzer will support all major analysis modes:
📄 Document Analysis - OCR, text extraction, data processing
- Best for: Reports, invoices, forms, contracts
- AI extracts: Text content, key data points, structured information
📊 Chart Analysis - Data visualization interpretation
- Best for: Graphs, charts, data visualizations
- AI extracts: Numerical data, trends, insights, patterns
🎯 General Analysis - Comprehensive visual understanding
- Best for: Screenshots, photos, general images
- AI extracts: Objects, context, descriptions, recommendations
We’ll start with a unified approach that can handle any type of visual content intelligently.
🔧 Step 2: Adding Vision Analysis to Your Backend
Section titled “🔧 Step 2: Adding Vision Analysis to Your Backend”Let’s add vision analysis to your existing backend using the same patterns you learned in previous modules. We’ll add new routes to handle image uploads and AI analysis.
Building on your foundation: You already have a working Node.js server with OpenAI integration. We’re simply adding vision capabilities to what you’ve built.
Step 2A: Understanding Vision Analysis State
Section titled “Step 2A: Understanding Vision Analysis State”Before writing code, let’s understand what data our vision analysis system needs to manage:
// 🧠 VISION ANALYSIS STATE CONCEPTS:// 1. Image Upload - The uploaded image data and metadata// 2. Analysis Type - Document, chart, or general analysis mode// 3. Vision Settings - OCR, data extraction, detail level// 4. AI Results - Processed insights and extracted information// 5. Error States - Invalid images, processing failures, file size limits
Key vision analysis concepts:
- Image Processing: Different analysis approaches for documents vs photos
- GPT-4o Vision: Using OpenAI’s vision model for image understanding
- Analysis Modes: OCR-focused, data extraction, or general analysis
- Results Structure: Organized output that’s easy to display
Step 2B: Installing Required Dependencies
Section titled “Step 2B: Installing Required Dependencies”First, add the image processing dependencies to your backend. In your backend folder, run:
npm install sharp
What this package does:
- sharp: Optimizes images for better AI analysis and smaller file sizes
Step 2C: Adding the Vision Analysis Route
Section titled “Step 2C: Adding the Vision Analysis Route”Add this new endpoint to your existing index.js
file, right after your text-to-speech routes:
import sharp from 'sharp';
// 👁️ VISION ANALYSIS ENDPOINT: Add this to your existing serverapp.post("/api/vision/analyze", upload.single("image"), async (req, res) => { try { // 🛡️ VALIDATION: Check if image was uploaded const uploadedImage = req.file; const { analysisType = "general", includeOCR = true, extractData = true } = req.body;
if (!uploadedImage) { return res.status(400).json({ error: "Image file is required", success: false }); }
console.log(`👁️ Analyzing: ${uploadedImage.originalname} (${uploadedImage.size} bytes)`);
// 🖼️ IMAGE OPTIMIZATION: Prepare image for vision analysis const optimizedImage = await optimizeImageForVision(uploadedImage.buffer); const base64Image = optimizedImage.toString('base64'); const imageUrl = `data:${uploadedImage.mimetype};base64,${base64Image}`;
// 🔍 ANALYSIS PROMPT: Generate appropriate prompt based on type const analysisPrompt = generateVisionPrompt(analysisType, includeOCR, extractData);
// 🤖 AI VISION ANALYSIS: Process with GPT-4o const response = await openai.responses.create({ model: "gpt-4o", input: [ { role: "system", content: analysisPrompt.systemPrompt }, { role: "user", content: [ { type: "text", text: analysisPrompt.userPrompt }, { type: "image_url", image_url: { url: imageUrl, detail: "high" } } ] } ] });
// 📤 SUCCESS RESPONSE: Send analysis results res.json({ success: true, file_info: { name: uploadedImage.originalname, size: uploadedImage.size, type: uploadedImage.mimetype }, analysis: { type: analysisType, include_ocr: includeOCR, extract_data: extractData, result: response.output_text, model: "gpt-4o" }, timestamp: new Date().toISOString() });
} catch (error) { // 🚨 ERROR HANDLING: Handle analysis failures console.error("Vision analysis error:", error);
res.status(500).json({ error: "Failed to analyze image", details: error.message, success: false }); }});
// 🔧 HELPER FUNCTIONS: Vision analysis utilities
// Optimize image for better vision analysisconst optimizeImageForVision = async (imageBuffer) => { try { // Resize large images for better processing const optimized = await sharp(imageBuffer) .resize(2048, 2048, { fit: 'inside', withoutEnlargement: true }) .jpeg({ quality: 85 }) .toBuffer();
return optimized; } catch (error) { console.error('Image optimization error:', error); return imageBuffer; // Return original if optimization fails }};
// Generate analysis prompts based on typeconst generateVisionPrompt = (analysisType, includeOCR, extractData) => { const baseSystem = "You are a professional visual analyst with expertise in document analysis, data extraction, and image understanding.";
switch (analysisType) { case 'document': return { systemPrompt: `${baseSystem} You specialize in document analysis, OCR, and text extraction.`, userPrompt: `Analyze this document image with focus on: 1. TEXT EXTRACTION: ${includeOCR ? 'Extract all readable text content using OCR' : 'Summarize visible text content'} 2. DOCUMENT STRUCTURE: Identify document type, layout, and organization 3. KEY DATA: Extract important numbers, dates, names, and values 4. INSIGHTS: Provide analysis of the document's purpose and key information
Provide clear, structured analysis that's easy to understand.` };
case 'chart': return { systemPrompt: `${baseSystem} You specialize in chart analysis, data visualization interpretation, and trend analysis.`, userPrompt: `Analyze this chart/graph with focus on: 1. CHART TYPE: Identify the type of visualization (bar, line, pie, etc.) 2. DATA EXTRACTION: ${extractData ? 'Extract specific numerical values and data points' : 'Summarize key trends and patterns'} 3. TRENDS: Identify patterns, trends, and significant changes 4. INSIGHTS: Provide business intelligence and actionable insights
Focus on accuracy and clear interpretation of the visual data.` };
default: // general return { systemPrompt: `${baseSystem} You provide comprehensive visual analysis for any type of image.`, userPrompt: `Analyze this image comprehensively: 1. CONTENT DESCRIPTION: What do you see in this image? 2. KEY ELEMENTS: Important objects, text, or data visible 3. CONTEXT ANALYSIS: Purpose, setting, or business context 4. ACTIONABLE INSIGHTS: Useful observations or recommendations
${includeOCR ? 'Include any readable text content.' : ''} ${extractData ? 'Extract any numerical or structured data visible.' : ''}
Provide practical, useful analysis that helps users understand the image better.` }; }};
Function breakdown:
- Validation - Ensure we have an image to analyze
- Image optimization - Prepare image for better AI analysis
- Prompt generation - Create appropriate analysis prompts
- Vision analysis - Process with GPT-4o vision capabilities
- Response formatting - Return structured results with metadata
Step 2D: Updating File Upload Configuration
Section titled “Step 2D: Updating File Upload Configuration”Update your existing multer configuration to handle images:
// Update your existing multer setup to handle imagesconst upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 25 * 1024 * 1024 // 25MB limit }, fileFilter: (req, file, cb) => { // Accept all previous file types PLUS images const allowedTypes = [ 'application/pdf', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 'text/plain', 'text/csv', 'application/json', 'text/javascript', 'text/x-python', 'audio/wav', 'audio/mp3', 'audio/mpeg', 'audio/mp4', 'audio/webm', 'image/jpeg', // Add image support 'image/png', // Add image support 'image/webp', // Add image support 'image/gif' // Add image support ];
const extension = path.extname(file.originalname).toLowerCase(); const allowedExtensions = ['.pdf', '.docx', '.xlsx', '.csv', '.txt', '.md', '.json', '.js', '.py', '.wav', '.mp3', '.jpeg', '.jpg', '.png', '.webp', '.gif'];
if (allowedTypes.includes(file.mimetype) || allowedExtensions.includes(extension)) { cb(null, true); } else { cb(new Error('Unsupported file type'), false); } }});
Your backend now supports:
- Text chat (existing functionality)
- Streaming chat (existing functionality)
- Image generation (existing functionality)
- Audio transcription (existing functionality)
- File analysis (existing functionality)
- Text-to-speech (existing functionality)
- Vision analysis (new functionality)
🔧 Step 3: Building the React Vision Component
Section titled “🔧 Step 3: Building the React Vision Component”Now let’s create a React component for vision analysis using the same patterns from your existing components.
Step 3A: Creating the Vision Analysis Component
Section titled “Step 3A: Creating the Vision Analysis Component”Create a new file src/VisionAnalysis.jsx
:
import { useState, useRef } from "react";import { Upload, Eye, FileText, BarChart3, Download, Camera } from "lucide-react";
function VisionAnalysis() { // 🧠 STATE: Vision analysis data management const [selectedImage, setSelectedImage] = useState(null); // Uploaded image const [analysisType, setAnalysisType] = useState("general"); // Analysis mode const [isAnalyzing, setIsAnalyzing] = useState(false); // Processing status const [analysisResult, setAnalysisResult] = useState(null); // Analysis results const [error, setError] = useState(null); // Error messages const [previewUrl, setPreviewUrl] = useState(null); // Image preview const [options, setOptions] = useState({ // Analysis options includeOCR: true, extractData: true }); const fileInputRef = useRef(null);
// 🔧 FUNCTIONS: Vision analysis logic engine
// Handle image selection const handleImageSelect = (event) => { const file = event.target.files[0]; if (file) { // Validate file size (25MB limit) if (file.size > 25 * 1024 * 1024) { setError('Image too large. Maximum size is 25MB.'); return; }
// Validate file type const allowedTypes = ['image/jpeg', 'image/png', 'image/webp', 'image/gif']; if (!allowedTypes.includes(file.type)) { setError('Unsupported image type. Please upload JPEG, PNG, WebP, or GIF files.'); return; }
setSelectedImage(file); setAnalysisResult(null); setError(null);
// Create preview URL const url = URL.createObjectURL(file); setPreviewUrl(url); } };
// Clear selected image const clearImage = () => { setSelectedImage(null); setAnalysisResult(null); setError(null); if (previewUrl) { URL.revokeObjectURL(previewUrl); setPreviewUrl(null); } if (fileInputRef.current) { fileInputRef.current.value = ''; } };
// Main vision analysis function const analyzeImage = async () => { // 🛡️ GUARDS: Prevent invalid analysis if (!selectedImage || isAnalyzing) return;
// 🔄 SETUP: Prepare for analysis setIsAnalyzing(true); setError(null); setAnalysisResult(null);
try { // 📤 FORM DATA: Prepare multipart form data const formData = new FormData(); formData.append('image', selectedImage); formData.append('analysisType', analysisType); formData.append('includeOCR', options.includeOCR); formData.append('extractData', options.extractData);
// 📡 API CALL: Send to your backend const response = await fetch("http://localhost:8000/api/vision/analyze", { method: "POST", body: formData });
const data = await response.json();
if (!response.ok) { throw new Error(data.error || 'Failed to analyze image'); }
// ✅ SUCCESS: Store analysis results setAnalysisResult(data);
} catch (error) { // 🚨 ERROR HANDLING: Show user-friendly message console.error('Vision analysis failed:', error); setError(error.message || 'Something went wrong while analyzing the image'); } finally { // 🧹 CLEANUP: Reset processing state setIsAnalyzing(false); } };
// Download analysis results const downloadAnalysis = () => { if (!analysisResult) return;
const element = document.createElement('a'); const file = new Blob([JSON.stringify(analysisResult, null, 2)], { type: 'application/json' }); element.href = URL.createObjectURL(file); element.download = `vision-analysis-${selectedImage.name}-${Date.now()}.json`; document.body.appendChild(element); element.click(); document.body.removeChild(element); };
// Analysis type options const analysisTypes = [ { value: "general", label: "General Analysis", desc: "Comprehensive visual understanding", icon: Eye }, { value: "document", label: "Document Analysis", desc: "OCR and text extraction focus", icon: FileText }, { value: "chart", label: "Chart Analysis", desc: "Data visualization interpretation", icon: BarChart3 } ];
// Format file size const formatFileSize = (bytes) => { if (bytes === 0) return '0 Bytes'; const k = 1024; const sizes = ['Bytes', 'KB', 'MB']; const i = Math.floor(Math.log(bytes) / Math.log(k)); return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i]; };
// 🎨 UI: Interface components return ( <div className="min-h-screen bg-gradient-to-br from-indigo-50 to-purple-50 flex items-center justify-center p-4"> <div className="bg-white rounded-2xl shadow-2xl w-full max-w-6xl flex flex-col overflow-hidden">
{/* Header */} <div className="bg-gradient-to-r from-indigo-600 to-purple-600 text-white p-6"> <div className="flex items-center space-x-3"> <div className="w-10 h-10 bg-white bg-opacity-20 rounded-full flex items-center justify-center"> <Eye className="w-5 h-5" /> </div> <div> <h1 className="text-xl font-bold">👁️ AI Vision Analysis</h1> <p className="text-indigo-100 text-sm">Analyze any image with AI intelligence!</p> </div> </div> </div>
{/* Analysis Type Selection */} <div className="p-6 border-b border-gray-200"> <h3 className="font-semibold text-gray-900 mb-4 flex items-center"> <Camera className="w-5 h-5 mr-2 text-indigo-600" /> Analysis Type </h3>
<div className="grid grid-cols-1 md:grid-cols-3 gap-4"> {analysisTypes.map((type) => { const IconComponent = type.icon; return ( <button key={type.value} onClick={() => setAnalysisType(type.value)} className={`p-4 rounded-lg border-2 text-left transition-all duration-200 ${ analysisType === type.value ? 'border-indigo-500 bg-indigo-50 shadow-md' : 'border-gray-200 hover:border-indigo-300 hover:bg-indigo-50' }`} > <div className="flex items-center mb-2"> <IconComponent className="w-5 h-5 mr-2 text-indigo-600" /> <h4 className="font-medium text-gray-900">{type.label}</h4> </div> <p className="text-sm text-gray-600">{type.desc}</p> </button> ); })} </div> </div>
{/* Analysis Options */} <div className="p-6 border-b border-gray-200"> <h3 className="font-semibold text-gray-900 mb-4">Analysis Options</h3>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4"> <label className="flex items-center space-x-3 p-3 rounded-lg border border-gray-200 hover:bg-gray-50 cursor-pointer"> <input type="checkbox" checked={options.includeOCR} onChange={(e) => setOptions(prev => ({ ...prev, includeOCR: e.target.checked }))} className="w-4 h-4 text-indigo-600 rounded focus:ring-indigo-500" /> <div> <span className="font-medium text-gray-900">Include OCR</span> <p className="text-sm text-gray-600">Extract text content from images</p> </div> </label>
<label className="flex items-center space-x-3 p-3 rounded-lg border border-gray-200 hover:bg-gray-50 cursor-pointer"> <input type="checkbox" checked={options.extractData} onChange={(e) => setOptions(prev => ({ ...prev, extractData: e.target.checked }))} className="w-4 h-4 text-indigo-600 rounded focus:ring-indigo-500" /> <div> <span className="font-medium text-gray-900">Extract Data</span> <p className="text-sm text-gray-600">Find numerical data and structured information</p> </div> </label> </div> </div>
{/* Image Upload Section */} <div className="p-6 border-b border-gray-200"> <h3 className="font-semibold text-gray-900 mb-4 flex items-center"> <Upload className="w-5 h-5 mr-2 text-indigo-600" /> Upload Image for Analysis </h3>
{!selectedImage ? ( <div onClick={() => fileInputRef.current?.click()} className="border-2 border-dashed border-gray-300 rounded-xl p-8 text-center cursor-pointer hover:border-indigo-400 hover:bg-indigo-50 transition-colors duration-200" > <Upload className="w-12 h-12 text-gray-400 mx-auto mb-4" /> <h4 className="text-lg font-semibold text-gray-700 mb-2">Upload Image</h4> <p className="text-gray-600 mb-4"> Support for JPEG, PNG, WebP, and GIF files up to 25MB </p> <button className="px-6 py-3 bg-gradient-to-r from-indigo-600 to-purple-600 text-white rounded-xl hover:from-indigo-700 hover:to-purple-700 transition-all duration-200 inline-flex items-center space-x-2 shadow-lg"> <Upload className="w-4 h-4" /> <span>Choose Image</span> </button> </div> ) : ( <div className="bg-gray-50 rounded-lg p-4 border border-gray-200"> <div className="grid grid-cols-1 md:grid-cols-2 gap-4"> {/* Image Preview */} <div> <h4 className="font-medium text-gray-900 mb-2">Preview:</h4> <img src={previewUrl} alt={selectedImage.name} className="w-full h-48 object-cover rounded-lg border border-gray-200" /> </div>
{/* Image Info */} <div> <div className="flex items-center justify-between mb-4"> <div> <h4 className="font-medium text-gray-900">{selectedImage.name}</h4> <p className="text-sm text-gray-600">{formatFileSize(selectedImage.size)}</p> </div> <button onClick={clearImage} className="p-2 text-gray-400 hover:text-red-600 transition-colors duration-200" > × </button> </div>
<button onClick={analyzeImage} disabled={isAnalyzing} className="w-full bg-gradient-to-r from-indigo-600 to-purple-600 hover:from-indigo-700 hover:to-purple-700 disabled:from-gray-300 disabled:to-gray-300 text-white px-6 py-3 rounded-lg transition-all duration-200 flex items-center justify-center space-x-2 shadow-lg disabled:shadow-none" > {isAnalyzing ? ( <> <div className="w-4 h-4 border-2 border-white border-t-transparent rounded-full animate-spin"></div> <span>Analyzing...</span> </> ) : ( <> <Eye className="w-4 h-4" /> <span>Analyze Image</span> </> )} </button> </div> </div> </div> )}
<input ref={fileInputRef} type="file" accept="image/jpeg,image/png,image/webp,image/gif" onChange={handleImageSelect} className="hidden" /> </div>
{/* Results Section */} <div className="flex-1 p-6"> {/* Error Display */} {error && ( <div className="bg-red-50 border border-red-200 rounded-lg p-4 mb-4"> <p className="text-red-700"> <strong>Error:</strong> {error} </p> </div> )}
{/* Analysis Results */} {analysisResult ? ( <div className="bg-gray-50 rounded-lg p-4"> <div className="flex items-center justify-between mb-4"> <h4 className="font-semibold text-gray-900">Vision Analysis Results</h4> <button onClick={downloadAnalysis} className="bg-gradient-to-r from-blue-500 to-blue-600 hover:from-blue-600 hover:to-blue-700 text-white px-4 py-2 rounded-lg transition-all duration-200 flex items-center space-x-2" > <Download className="w-4 h-4" /> <span>Download</span> </button> </div>
<div className="space-y-4"> {/* File Information */} <div className="bg-white rounded-lg p-4"> <h5 className="font-medium text-gray-700 mb-2">Image Information:</h5> <div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm"> <div> <span className="text-gray-600">Name:</span> <p className="font-medium">{analysisResult.file_info.name}</p> </div> <div> <span className="text-gray-600">Size:</span> <p className="font-medium">{formatFileSize(analysisResult.file_info.size)}</p> </div> <div> <span className="text-gray-600">Type:</span> <p className="font-medium">{analysisResult.file_info.type}</p> </div> <div> <span className="text-gray-600">Analysis:</span> <p className="font-medium capitalize">{analysisResult.analysis.type}</p> </div> </div> </div>
{/* Analysis Content */} <div className="bg-white rounded-lg p-4"> <h5 className="font-medium text-gray-700 mb-2">AI Vision Analysis:</h5> <div className="text-gray-900 leading-relaxed whitespace-pre-wrap max-h-96 overflow-y-auto"> {analysisResult.analysis.result} </div> </div> </div> </div> ) : !isAnalyzing && !error && ( // Welcome State <div className="text-center py-12"> <div className="w-16 h-16 bg-indigo-100 rounded-2xl flex items-center justify-center mx-auto mb-4"> <Eye className="w-8 h-8 text-indigo-600" /> </div> <h3 className="text-lg font-semibold text-gray-700 mb-2"> Ready to Analyze! </h3> <p className="text-gray-600 max-w-md mx-auto"> Upload any image to get AI-powered visual analysis, text extraction, and intelligent insights. </p> </div> )} </div> </div> </div> );}
export default VisionAnalysis;
Step 3B: Adding Vision Analysis to Navigation
Section titled “Step 3B: Adding Vision Analysis to Navigation”Update your src/App.jsx
to include the new vision analysis component:
import { useState } from "react";import StreamingChat from "./StreamingChat";import ImageGenerator from "./ImageGenerator";import AudioTranscription from "./AudioTranscription";import FileAnalysis from "./FileAnalysis";import TextToSpeech from "./TextToSpeech";import VisionAnalysis from "./VisionAnalysis";import { MessageSquare, Image, Mic, Folder, Volume2, Eye } from "lucide-react";
function App() { // 🧠 STATE: Navigation management const [currentView, setCurrentView] = useState("chat"); // 'chat', 'images', 'audio', 'files', 'speech', or 'vision'
// 🎨 UI: Main app with navigation return ( <div className="min-h-screen bg-gray-100"> {/* Navigation Header */} <nav className="bg-white shadow-sm border-b border-gray-200"> <div className="max-w-6xl mx-auto px-4"> <div className="flex items-center justify-between h-16"> {/* Logo */} <div className="flex items-center space-x-3"> <div className="w-8 h-8 bg-gradient-to-r from-blue-500 to-purple-600 rounded-lg flex items-center justify-center"> <span className="text-white font-bold text-sm">AI</span> </div> <h1 className="text-xl font-bold text-gray-900">OpenAI Mastery</h1> </div>
{/* Navigation Buttons */} <div className="flex space-x-2"> <button onClick={() => setCurrentView("chat")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "chat" ? "bg-blue-100 text-blue-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <MessageSquare className="w-4 h-4" /> <span>Chat</span> </button>
<button onClick={() => setCurrentView("images")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "images" ? "bg-purple-100 text-purple-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Image className="w-4 h-4" /> <span>Images</span> </button>
<button onClick={() => setCurrentView("audio")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "audio" ? "bg-blue-100 text-blue-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Mic className="w-4 h-4" /> <span>Audio</span> </button>
<button onClick={() => setCurrentView("files")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "files" ? "bg-green-100 text-green-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Folder className="w-4 h-4" /> <span>Files</span> </button>
<button onClick={() => setCurrentView("speech")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "speech" ? "bg-orange-100 text-orange-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Volume2 className="w-4 h-4" /> <span>Speech</span> </button>
<button onClick={() => setCurrentView("vision")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "vision" ? "bg-indigo-100 text-indigo-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Eye className="w-4 h-4" /> <span>Vision</span> </button> </div> </div> </div> </nav>
{/* Main Content */} <main className="h-[calc(100vh-4rem)]"> {currentView === "chat" && <StreamingChat />} {currentView === "images" && <ImageGenerator />} {currentView === "audio" && <AudioTranscription />} {currentView === "files" && <FileAnalysis />} {currentView === "speech" && <TextToSpeech />} {currentView === "vision" && <VisionAnalysis />} </main> </div> );}
export default App;
🧪 Testing Your Vision Analysis
Section titled “🧪 Testing Your Vision Analysis”Let’s test your vision analysis feature step by step to make sure everything works correctly.
Step 1: Backend Route Test
Section titled “Step 1: Backend Route Test”First, verify your backend route works by testing it directly:
Test with a simple image:
# Test the endpoint with an image filecurl -X POST http://localhost:8000/api/vision/analyze \ -F "image=@test-image.jpg" \ -F "analysisType=general" \ -F "includeOCR=true" \ -F "extractData=true"
Expected response:
{ "success": true, "file_info": { "name": "test-image.jpg", "size": 245678, "type": "image/jpeg" }, "analysis": { "type": "general", "include_ocr": true, "extract_data": true, "result": "This image shows...", "model": "gpt-4o" }, "timestamp": "2024-01-15T10:30:00.000Z"}
Step 2: Full Application Test
Section titled “Step 2: Full Application Test”Start both servers:
Backend (in your backend folder):
npm run dev
Frontend (in your frontend folder):
npm run dev
Test the complete flow:
- Navigate to Vision → Click the “Vision” tab in navigation
- Select analysis type → Choose “General”, “Document”, or “Chart” analysis
- Configure options → Enable OCR or data extraction as needed
- Upload an image → Try a screenshot, document, or chart
- Analyze → Click “Analyze Image” and see loading state
- View results → See AI analysis with image information
- Download → Test downloading analysis as JSON file
- Switch images → Try different image types and analysis modes
Step 3: Error Handling Test
Section titled “Step 3: Error Handling Test”Test error scenarios:
❌ Large image: Upload image larger than 25MB❌ Wrong type: Upload unsupported file (like .txt or .mp4)❌ Empty upload: Try to analyze without selecting an image❌ Corrupt image: Upload damaged image file
Expected behavior:
- Clear error messages displayed
- No application crashes
- User can try again with different image
- Image upload resets properly after errors
✅ What You Built
Section titled “✅ What You Built”Congratulations! You’ve extended your existing application with complete AI vision analysis:
- ✅ Extended your backend with vision processing and GPT-4o integration
- ✅ Added React vision component following the same patterns as your other features
- ✅ Implemented intelligent image analysis for documents, charts, and general content
- ✅ Created flexible analysis modes with OCR and data extraction options
- ✅ Added download functionality for analysis results
- ✅ Maintained consistent design with your existing application
Your application now has:
- Text chat with streaming responses
- Image generation with DALL-E 3 and GPT-Image-1
- Audio transcription with Whisper voice recognition
- File analysis with intelligent document processing
- Text-to-speech with natural voice synthesis
- Vision analysis with GPT-4o visual intelligence
- Unified navigation between all features
- Professional UI with consistent TailwindCSS styling
Complete OpenAI mastery achieved! You now have a comprehensive application that leverages all major OpenAI capabilities in a unified, professional interface. 👁️
<function_calls>