Backends

eyeroll supports multiple vision backends. Each has different capabilities, costs, and tradeoffs.

Comparison

	Gemini	TwelveLabs	OpenAI	Ollama	OpenRouter / Groq / Grok / Cerebras
Model	Gemini 2.0 Flash	Pegasus 1.5	GPT-4o	qwen3-vl (default)	Varies by provider
Strategy	Direct upload	Direct video report	Multi-frame batch	Frame-by-frame	Multi-frame batch
Audio	Yes (native)	Included in video analysis	Yes (Whisper)	No	No
API key	GEMINI_API_KEY	TWELVE_LABS_API_KEY	OPENAI_API_KEY	None	Provider-specific
Cost per video	~$0.15	Usage-based	~$0.20	Free	Varies
Privacy	Cloud	Cloud	Cloud	Fully local	Cloud

When to use which

Gemini -- Best overall. Direct video upload via File API (up to 2GB with API key, 20MB with service account). Native audio transcription. Cheapest cloud option.

TwelveLabs -- Best when you want a video-native model to produce the final structured report directly from the recording. Uses the TwelveLabs asset upload and Pegasus analysis flow.

OpenAI -- Good if you already have an OpenAI key. Uses multi-frame batch (all frames in one API call) for efficiency. Whisper for audio. Slightly more expensive than Gemini.

OpenRouter / Groq / Grok / Cerebras -- OpenAI-compatible providers. Same multi-frame batch strategy as OpenAI. Useful for model variety (OpenRouter), low latency (Groq), or specific model families.

Ollama -- Best for privacy and offline use. Runs entirely on your machine. No audio transcription. Frame-by-frame (one frame per API call). Quality depends on the model and hardware.

openai-compat -- Any OpenAI-compatible endpoint. Use --base-url to point at your own server.

Analysis strategy

eyeroll runs a preflight check to detect backend capabilities, then chooses the best strategy:

Strategy	When used	How it works
Direct upload	Gemini (within size limits)	Full video uploaded via File API in one request
Direct video report	TwelveLabs	Video uploaded as a TwelveLabs asset, then Pegasus generates the structured report directly
Multi-frame batch	OpenAI, OpenRouter, Groq, Grok, Cerebras	All frames sent as images in a single API call
Frame-by-frame	Ollama, or fallback for very large videos	Each frame analyzed in a separate API call

The strategy is chosen automatically based on what the backend reports it can do. You don't need to configure it.

Switching backends

# Via environment variable
export EYEROLL_BACKEND=ollama

# Via CLI flag
eyeroll watch video.mp4 --backend openai
eyeroll watch video.mp4 --backend twelvelabs
eyeroll watch video.mp4 --backend groq
eyeroll watch video.mp4 --backend openai-compat --base-url https://my-server/v1

# Via eyeroll init
eyeroll init

The --backend flag overrides the environment variable for a single run.