Veo 3.1
Google's highest-quality video generation with native audio and synchronized dialogue.
Capabilities
| Feature | Support |
|---|---|
| Text-to-Video | Yes |
| Image-to-Video | Yes (first frame) |
| Frame Interpolation | Yes (first + last frame) |
| Video Extension | Yes (up to 20 extensions, 148s total) |
| Reference Images | Up to 3 |
| Max Resolution | 4K |
| Aspect Ratios | 16:9, 9:16 |
| Duration | 4s, 6s, 8s |
| Frame Rate | 24 fps |
| Negative Prompt | Yes |
| Native Audio | Yes (dialogue, SFX, ambient) |
Native Audio Generation
Veo 3.1 generates audio natively alongside video. No separate audio model needed.
Dialogue
Write quoted speech in your prompt and the model generates synchronized lip movements and voice:
A news anchor sits at a desk and says "Breaking news: scientists have discovered
a new species of deep-sea fish." The studio has blue lighting.
Sound Effects
Describe sounds explicitly and they'll be generated:
A car races through a rainy city street, tires screeching on wet asphalt,
windshield wipers clicking rhythmically.
Ambient Audio
Describe the environment and the model creates matching soundscapes:
A peaceful forest clearing at dawn, birds singing, a gentle stream flowing
over rocks, leaves rustling in a light breeze.
When extending a video, voice/audio carries over only if it's present in the last 1 second of the original clip. Plan your prompts accordingly.
Resolution and Duration Constraints
| Resolution | Available Durations | Notes |
|---|---|---|
| 720p | 4s, 6s, 8s | Default, all durations available |
| 1080p | 8s only | Requires 8-second duration |
| 4K | 8s only | Requires 8-second duration |
The UI automatically disables incompatible resolution options based on your selected duration.
Video Extension
Extend a previously generated video by up to 7 seconds per extension, up to 20 times (148 seconds total):
- Only works with Veo-generated videos
- Extended videos maintain 720p resolution
- Aspect ratio must be 16:9 or 9:16
- Extended video storage window: 2 days (resets on each extension)
Reference Images
Include up to 3 reference images to guide video content:
- Objects and characters from reference images are incorporated into the video
- References force 8-second duration
- Connect upstream image nodes to the video node's reference input
You can use @[Image 1] mentions in video prompts for readability, but Veo processes reference images as structured inputs rather than inline content. The mention is converted to the display name in the prompt text (e.g., @[Image 1] becomes Image 1). For precise per-image instructions, describe the role of each reference in your prompt text.
Prompting Tips
- Write like a screenplay. Describe action, camera movement, lighting, and mood in natural language.
- Use quoted dialogue for speech.
"Hello!" she wavedgenerates matching voice and lip sync. - Describe sound explicitly. Don't assume the model will add sounds — spell them out.
- Keep prompts focused. One clear scene per generation works better than complex multi-scene descriptions.
- Use negative prompts to exclude unwanted elements: "no text overlay, no watermark."
Content Policy
- Videos are watermarked with SynthID (invisible, verifiable AI content marker)
- EU/UK/CH regions: Person generation restricted to adults only (minors prohibited)
- Safety filters may block prompts that violate content guidelines — you won't be charged if blocked
Limitations
- Only 16:9 and 9:16 aspect ratios (no 1:1 or other ratios)
- 1080p/4K locked to 8-second duration
- Video extension only for Veo-generated content (can't extend uploaded videos)
- Audio quality depends on prompt specificity
- Generation time ~2 minutes (use Veo 3.1 Fast for quicker iteration)
See Also
- Veo 3.1 Fast — Faster variant for iteration