Veo 3.1

Google's highest-quality video generation with native audio and synchronized dialogue.

Capabilities

Feature	Support
Text-to-Video	Yes
Image-to-Video	Yes (first frame)
Frame Interpolation	Yes (first + last frame)
Video Extension	Yes (up to 20 extensions, 148s total)
Reference Images	Up to 3
Max Resolution	4K
Aspect Ratios	16:9, 9:16
Duration	4s, 6s, 8s
Frame Rate	24 fps
Negative Prompt	Yes
Native Audio	Yes (dialogue, SFX, ambient)

Native Audio Generation

Veo 3.1 generates audio natively alongside video. No separate audio model needed.

Dialogue

Write quoted speech in your prompt and the model generates synchronized lip movements and voice:

A news anchor sits at a desk and says "Breaking news: scientists have discovered 
a new species of deep-sea fish." The studio has blue lighting.

Sound Effects

Describe sounds explicitly and they'll be generated:

A car races through a rainy city street, tires screeching on wet asphalt, 
windshield wipers clicking rhythmically.

Ambient Audio

Describe the environment and the model creates matching soundscapes:

A peaceful forest clearing at dawn, birds singing, a gentle stream flowing 
over rocks, leaves rustling in a light breeze.

Audio in Video Extension

When extending a video, voice/audio carries over only if it's present in the last 1 second of the original clip. Plan your prompts accordingly.

Resolution and Duration Constraints

Resolution	Available Durations	Notes
720p	4s, 6s, 8s	Default, all durations available
1080p	8s only	Requires 8-second duration
4K	8s only	Requires 8-second duration

The UI automatically disables incompatible resolution options based on your selected duration.

Video Extension

Extend a previously generated video by up to 7 seconds per extension, up to 20 times (148 seconds total):

Only works with Veo-generated videos
Extended videos maintain 720p resolution
Aspect ratio must be 16:9 or 9:16
Extended video storage window: 2 days (resets on each extension)

Reference Images

Include up to 3 reference images to guide video content:

Objects and characters from reference images are incorporated into the video
References force 8-second duration
Connect upstream image nodes to the video node's reference input

@-mentions in video prompts

You can use @[Image 1] mentions in video prompts for readability, but Veo processes reference images as structured inputs rather than inline content. The mention is converted to the display name in the prompt text (e.g., @[Image 1] becomes Image 1). For precise per-image instructions, describe the role of each reference in your prompt text.

Prompting Tips

Write like a screenplay. Describe action, camera movement, lighting, and mood in natural language.
Use quoted dialogue for speech. "Hello!" she waved generates matching voice and lip sync.
Describe sound explicitly. Don't assume the model will add sounds — spell them out.
Keep prompts focused. One clear scene per generation works better than complex multi-scene descriptions.
Use negative prompts to exclude unwanted elements: "no text overlay, no watermark."

Content Policy

Videos are watermarked with SynthID (invisible, verifiable AI content marker)
EU/UK/CH regions: Person generation restricted to adults only (minors prohibited)
Safety filters may block prompts that violate content guidelines — you won't be charged if blocked

Limitations

Only 16:9 and 9:16 aspect ratios (no 1:1 or other ratios)
1080p/4K locked to 8-second duration
Video extension only for Veo-generated content (can't extend uploaded videos)
Audio quality depends on prompt specificity
Generation time ~2 minutes (use Veo 3.1 Fast for quicker iteration)

Capabilities​

Native Audio Generation​

Dialogue​

Sound Effects​

Ambient Audio​

Resolution and Duration Constraints​

Video Extension​

Reference Images​

Prompting Tips​

Content Policy​

Limitations​

See Also​