This guide explains the caller-facing JSON shape for vidkit compose. It is practical, not a complete implementation reference.
A compose spec describes output settings, optional default transition/audio behavior, and a list of scenes.
{
"size": "1280x720",
"fps": 30,
"transition": {"type": "fade", "duration": 0.25},
"audio": {"type": "silence"},
"scenes": []
}Common top-level fields:
size: output size asWIDTHxHEIGHTfps: frames per secondtransition: default transition between scenesaudio: optional output audio bedscenes: ordered scene list
Validate any spec before rendering:
vidkit validate path/to/spec.json
vidkit --validate-only path/to/spec.jsonThe JSON schema at vidkit-compose.schema.json is useful for editor hints and assistant-generated specs. The Python validator remains the source of truth for semantic checks.
Scene objects require a type. Most scene types also need duration.
Generated visual scenes: card, bars, particles, wave, grid, orbits, typewriter.
Media scenes: image and media.
Composition scenes: layered.
Utility scenes: beat.
Use layered when you need control over composition, timed elements, lower-thirds, fake UI, animated sprites, panels, or reusable visual treatments.
{
"type": "layered",
"duration": 4,
"background": "#111827",
"layers": [
{"type": "panel", "x": 80, "y": 80, "width": 520, "height": 260, "color": "#0f172a", "radius": 16},
{"type": "text", "text": "Readable title", "x": 116, "y": 120, "size": 42}
]
}Layer types:
media: image/video layer withsource,fit, sizing, placement, opacity, and optional maskssprite: media-style layer intended for animated movementpanel: rectangle/backing plate with color, opacity, radius, and border supporttext: ASS subtitle-backed text eventlower_third: readable lower-third/caption treatmentshape: reusable primitives such as arrows, windows, cursors, checkboxes, and progress barspreset: reusable UI/dialog/stamp/meme/terminal/form treatments
A layer can be visible for the whole scene or a timed span.
{"type": "text", "text": "Now", "start": 0.5, "end": 2.2, "x": 120, "y": 160}Use timed layers for captions, popups, beats, callouts, and staged UI changes. Keep scene duration long enough for all visible layer spans.
Position, opacity, and media/sprite scale can be keyframed.
{
"type": "sprite",
"source": "examples/assets/sample.ppm",
"width": 92,
"height": 68,
"keyframes": [
{"time": 0.0, "x": 36, "y": 218, "opacity": 0},
{"time": 0.4, "x": 90, "y": 188, "opacity": 1, "ease": "out_cubic"},
{"time": 3.0, "x": 494, "y": 198}
]
}Supported easing values include linear, in_quad, out_quad, in_out_quad, in_cubic, out_cubic, and in_out_cubic.
Motion paths are available for sprite-like movement:
{
"type": "sprite",
"source": "examples/assets/sample.ppm",
"path": {
"type": "points",
"ease": "in_out_cubic",
"points": [
{"time": 0.0, "x": 36, "y": 218},
{"time": 1.4, "x": 286, "y": 92, "ease": "out_cubic"},
{"time": 3.0, "x": 494, "y": 198}
]
}
}Use presets for common entrances/exits before hand-authoring keyframes.
Animation presets: fade, fade_in, fade_out, slide_left, slide_right, slide_up, slide_down, pop, none.
Sprite helper presets: blink, bounce, jitter, squash, pop, slap.
{"type": "panel", "x": 80, "y": 80, "width": 320, "height": 160, "animate": {"in": "pop", "out": "fade"}}Layered scenes can move the whole composition with a camera block.
{
"type": "layered",
"duration": 4,
"camera": {
"keyframes": [
{"time": 0, "x": -20, "y": 0, "scale": 1.04},
{"time": 4, "x": 24, "y": 12, "scale": 1.12}
],
"shake": {"start": 2.2, "end": 2.8, "amount": 8, "frequency": 18}
},
"layers": []
}Use camera motion for scene-wide polish. Use layer keyframes when only one object should move.
Shape layers currently include progress_bar, checkbox, arrow, cursor, speech_bubble, file_icon, and window.
Preset layers currently include error_dialog, stamp, meme_caption, file_label, terminal_prompt, form_field, and warning_banner.
Shapes and presets are expanded into ordinary panel/text/media layers. That keeps the public spec compact while preserving deterministic rendering.
Transitions include fade, wipeleft, wiperight, slideleft, slideright, circleopen, and circleclose.
Set a default transition at the top level, or override per scene when needed.
Generated audio includes silence, tone, noise, pulse, and sfx.
SFX presets are deterministic utility cues, not production samples: bonk, error_beep, whoosh, censor_beep, printer_panic, and meow_ish.
If sound is part of the output, run vidkit qa and check levels. A muxed audio stream can still be effectively silent.
Beat scenes expand into ordinary generated/layered scenes and are useful for quick comedic timing.
{"type": "beat", "preset": "bonk", "duration": 0.75, "text": "BONK"}Supported beat presets: hard_cut_card, bonk, censor_meow, zoom_punch, and error_flash.
- Start from
vidkit init <template> <out.json>when possible. - Keep public examples free of private paths and private media.
- Validate before rendering.
- Use lower-thirds or panels behind small text.
- Use
radius, borders, and opacity intentionally; avoid cluttered layer stacks. - Prefer preset motion first, then custom keyframes when the preset is not enough.
- Inspect frames/contact sheets before calling a visual change finished.