Skip to main content

Veo 3.1

Google's highest-quality video generation with native audio and synchronized dialogue.

Capabilities

FeatureSupport
Text-to-VideoYes
Image-to-VideoYes (first frame)
Frame InterpolationYes (first + last frame)
Video ExtensionYes (up to 20 extensions, 148s total)
Reference ImagesUp to 3
Max Resolution4K
Aspect Ratios16:9, 9:16
Duration4s, 6s, 8s
Frame Rate24 fps
Negative PromptYes
Native AudioYes (dialogue, SFX, ambient)

Native Audio Generation

Veo 3.1 generates audio natively alongside video. No separate audio model needed.

Dialogue

Write quoted speech in your prompt and the model generates synchronized lip movements and voice:

A news anchor sits at a desk and says "Breaking news: scientists have discovered 
a new species of deep-sea fish." The studio has blue lighting.

Sound Effects

Describe sounds explicitly and they'll be generated:

A car races through a rainy city street, tires screeching on wet asphalt, 
windshield wipers clicking rhythmically.

Ambient Audio

Describe the environment and the model creates matching soundscapes:

A peaceful forest clearing at dawn, birds singing, a gentle stream flowing 
over rocks, leaves rustling in a light breeze.
Audio in Video Extension

When extending a video, voice/audio carries over only if it's present in the last 1 second of the original clip. Plan your prompts accordingly.

Resolution and Duration Constraints

ResolutionAvailable DurationsNotes
720p4s, 6s, 8sDefault, all durations available
1080p8s onlyRequires 8-second duration
4K8s onlyRequires 8-second duration

The UI automatically disables incompatible resolution options based on your selected duration.

Video Extension

Extend a previously generated video by up to 7 seconds per extension, up to 20 times (148 seconds total):

  • Only works with Veo-generated videos
  • Extended videos maintain 720p resolution
  • Aspect ratio must be 16:9 or 9:16
  • Extended video storage window: 2 days (resets on each extension)

Reference Images

Include up to 3 reference images to guide video content:

  • Objects and characters from reference images are incorporated into the video
  • References force 8-second duration
  • Connect upstream image nodes to the video node's reference input
@-mentions in video prompts

You can use @[Image 1] mentions in video prompts for readability, but Veo processes reference images as structured inputs rather than inline content. The mention is converted to the display name in the prompt text (e.g., @[Image 1] becomes Image 1). For precise per-image instructions, describe the role of each reference in your prompt text.

Prompting Tips

  • Write like a screenplay. Describe action, camera movement, lighting, and mood in natural language.
  • Use quoted dialogue for speech. "Hello!" she waved generates matching voice and lip sync.
  • Describe sound explicitly. Don't assume the model will add sounds — spell them out.
  • Keep prompts focused. One clear scene per generation works better than complex multi-scene descriptions.
  • Use negative prompts to exclude unwanted elements: "no text overlay, no watermark."

Content Policy

  • Videos are watermarked with SynthID (invisible, verifiable AI content marker)
  • EU/UK/CH regions: Person generation restricted to adults only (minors prohibited)
  • Safety filters may block prompts that violate content guidelines — you won't be charged if blocked

Limitations

  • Only 16:9 and 9:16 aspect ratios (no 1:1 or other ratios)
  • 1080p/4K locked to 8-second duration
  • Video extension only for Veo-generated content (can't extend uploaded videos)
  • Audio quality depends on prompt specificity
  • Generation time ~2 minutes (use Veo 3.1 Fast for quicker iteration)

See Also