OpenAI Backend
OpenAI GPT-4o for vision analysis and Whisper for audio transcription.
Setup
- Get an API key at platform.openai.com/api-keys
- Run
eyeroll initand select OpenAI, or set it directly:
Capabilities
| Feature | Supported |
|---|---|
| Direct video upload | No |
| Multi-frame batch | Yes (all frames in one API call) |
| Audio transcription | Yes (Whisper) |
| Text generation | Yes |
| Preflight health check | Yes (verifies API key) |
Model
Default model: gpt-4o
Override with the --model flag:
Model inference
If you pass a model name starting with gpt or o1 or o3 without specifying --backend, eyeroll automatically selects the OpenAI backend.
How it works
OpenAI does not support direct video upload. eyeroll uses the multi-frame batch strategy:
- Extract key frames from the video
- Send all frames as base64-encoded images in a single API call to GPT-4o, with timestamps per frame
- Transcribe audio using Whisper (
whisper-1model) - Synthesize the report from the batch analysis and transcript
This is more efficient than frame-by-frame (one API call instead of N) and gives the model temporal context across all frames.
OpenAI-compatible providers
The OpenAI backend also powers OpenRouter, Groq, Grok, Cerebras, and custom endpoints. These use the same multi-frame batch strategy but without Whisper audio transcription.
eyeroll watch video.mp4 --backend openrouter
eyeroll watch video.mp4 --backend groq
eyeroll watch video.mp4 --backend openai-compat --base-url https://my-server/v1
Audio transcription
Audio is transcribed via the Whisper API (whisper-1 model). This runs automatically when the video has an audio track. Silent recordings are detected and skipped.
Cost
A typical 1-minute video with 10-15 frames costs approximately $0.20, depending on image resolution and audio length.
Install
This installs the openai SDK.