Seedance 2.0

BytePlus's cinematic video model with native audio and multimodal reference support. Distinct from the Veo family in three ways:

Continuous duration range. Any integer duration from 4 to 15 seconds (inclusive) — pick whatever length matches the shot, not a fixed 4 / 6 / 8.
Mixed-modal references. You can pass images and videos and audio in the same run; the model fuses them into a single generation.
Strict first/last frame mode. When you want keyframe-driven interpolation rather than loose reference guidance, flip one toggle to switch dispatch mode.

Capabilities

Feature	Support
Text-to-Video	Yes
Image-to-Video	Yes (first frame, optional last frame)
Frame Interpolation	Yes (strict first + last frame mode)
Multimodal References	Up to 9 images + 3 videos + 3 audio clips in the same run
Max Resolution	1080p (1920×1080)
Aspect Ratios	Adaptive, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16
Duration	4–15s continuous (1s granularity)
Frame Rate	24 fps
Native Audio	Yes (opt-in per run)
Watermark	Yes (opt-in; BytePlus "Generated by AI" mark)

Duration Slider

Seedance 2.0 is the only video model in NGMC that uses a continuous duration slider rather than a discrete dropdown. Drag to any integer from 4 to 15 seconds. Pricing is computed per-second via model_pricing rows; every integer in the range has its own priced row, so switching length never produces a "pricing not configured" error.

Reference Inputs

Up to 15 reference items total, in any combination:

Kind	Limit
Image	9
Video	3
Audio	3
Text segments (prompt fragments)	10

Multimodal (mref2v) mode (default)

The model fuses all references into one generation. Use this when you want the model to borrow style / objects / audio cues loosely from the inputs.

Strict frame (i2v) mode

Toggle "Strict first/last frame" in the inspector. Behavior changes:

First image reference becomes the first frame of the output (keyframe, not a style hint).
If you add a second image into the end-frame slot, it becomes the last frame — the model interpolates between them.
Other reference types (videos, audio, additional images) are ignored in this mode (BytePlus API contract).

The UI renders the two keyframe slots as "First Frame" and "Last Frame" labels so the role is obvious.

Native Audio

Opt in via the Audio toggle in the inspector. When enabled, Seedance 2.0 generates a matching audio track alongside the video (SFX, ambient, tone). When disabled, the output is silent video.

Audio generation adds runtime and cost; leave it off for iteration passes.

Watermark

Opt-in. When on, BytePlus stamps a small "Generated by AI" mark on the output. Off by default; flip it on when delivering to platforms that require visible AI disclosure.

Resolution and Pricing

Resolution	Available Durations	Note
480p	4–15s, all integers	Lowest cost per second
720p	4–15s, all integers	Standard production quality
1080p	4–15s, all integers	Highest quality, highest cost

All three resolutions support the full duration range with no pinning (unlike Veo 3.1 where 1080p is locked to 8s).

Prompting Tips

Be specific about motion. Seedance responds well to camera-direction language: "slow dolly-in", "tracking left to right", "handheld with subtle shake".
Name your references. Use @[Image 1] / @[Video 2] mentions in the prompt to tell the model what role each reference plays.
Keep shots focused. One scene, one action. Multi-scene prompts produce weaker results than single continuous shots.
Pair audio opt-in with audio-descriptive prompts. If you enable the Audio toggle, describe the sound you want ("rain on metal roof", "muffled restaurant chatter"). Silence is default when audio is off.

Limitations

No video extension (unlike Veo 3.1 — each Seedance run is independent).
No negative prompt field.
Audio is opt-in per run and cannot be added to an already-generated silent video.
Strict frame mode is mutually exclusive with multimodal references — the backend validates this up front.
Batch / strip generation (generationCount > 1) is not supported; every Seedance run produces a single output.

Capabilities​

Duration Slider​

Reference Inputs​

Multimodal (mref2v) mode (default)​

Strict frame (i2v) mode​

Native Audio​

Watermark​

Resolution and Pricing​

Prompting Tips​

Limitations​

See Also​