Skip to content

Latest commit

 

History

History
195 lines (139 loc) · 5.87 KB

File metadata and controls

195 lines (139 loc) · 5.87 KB

Vidkit Spec Guide

This guide explains the caller-facing JSON shape for vidkit compose. It is practical, not a complete implementation reference.

Top-Level Shape

A compose spec describes output settings, optional default transition/audio behavior, and a list of scenes.

{
  "size": "1280x720",
  "fps": 30,
  "transition": {"type": "fade", "duration": 0.25},
  "audio": {"type": "silence"},
  "scenes": []
}

Common top-level fields:

  • size: output size as WIDTHxHEIGHT
  • fps: frames per second
  • transition: default transition between scenes
  • audio: optional output audio bed
  • scenes: ordered scene list

Validate any spec before rendering:

vidkit validate path/to/spec.json
vidkit --validate-only path/to/spec.json

The JSON schema at vidkit-compose.schema.json is useful for editor hints and assistant-generated specs. The Python validator remains the source of truth for semantic checks.

Scenes

Scene objects require a type. Most scene types also need duration.

Generated visual scenes: card, bars, particles, wave, grid, orbits, typewriter.

Media scenes: image and media.

Composition scenes: layered.

Utility scenes: beat.

Layered Scenes

Use layered when you need control over composition, timed elements, lower-thirds, fake UI, animated sprites, panels, or reusable visual treatments.

{
  "type": "layered",
  "duration": 4,
  "background": "#111827",
  "layers": [
    {"type": "panel", "x": 80, "y": 80, "width": 520, "height": 260, "color": "#0f172a", "radius": 16},
    {"type": "text", "text": "Readable title", "x": 116, "y": 120, "size": 42}
  ]
}

Layer types:

  • media: image/video layer with source, fit, sizing, placement, opacity, and optional masks
  • sprite: media-style layer intended for animated movement
  • panel: rectangle/backing plate with color, opacity, radius, and border support
  • text: ASS subtitle-backed text event
  • lower_third: readable lower-third/caption treatment
  • shape: reusable primitives such as arrows, windows, cursors, checkboxes, and progress bars
  • preset: reusable UI/dialog/stamp/meme/terminal/form treatments

Timing

A layer can be visible for the whole scene or a timed span.

{"type": "text", "text": "Now", "start": 0.5, "end": 2.2, "x": 120, "y": 160}

Use timed layers for captions, popups, beats, callouts, and staged UI changes. Keep scene duration long enough for all visible layer spans.

Motion

Position, opacity, and media/sprite scale can be keyframed.

{
  "type": "sprite",
  "source": "examples/assets/sample.ppm",
  "width": 92,
  "height": 68,
  "keyframes": [
    {"time": 0.0, "x": 36, "y": 218, "opacity": 0},
    {"time": 0.4, "x": 90, "y": 188, "opacity": 1, "ease": "out_cubic"},
    {"time": 3.0, "x": 494, "y": 198}
  ]
}

Supported easing values include linear, in_quad, out_quad, in_out_quad, in_cubic, out_cubic, and in_out_cubic.

Motion paths are available for sprite-like movement:

{
  "type": "sprite",
  "source": "examples/assets/sample.ppm",
  "path": {
    "type": "points",
    "ease": "in_out_cubic",
    "points": [
      {"time": 0.0, "x": 36, "y": 218},
      {"time": 1.4, "x": 286, "y": 92, "ease": "out_cubic"},
      {"time": 3.0, "x": 494, "y": 198}
    ]
  }
}

Animation Presets

Use presets for common entrances/exits before hand-authoring keyframes.

Animation presets: fade, fade_in, fade_out, slide_left, slide_right, slide_up, slide_down, pop, none.

Sprite helper presets: blink, bounce, jitter, squash, pop, slap.

{"type": "panel", "x": 80, "y": 80, "width": 320, "height": 160, "animate": {"in": "pop", "out": "fade"}}

Camera

Layered scenes can move the whole composition with a camera block.

{
  "type": "layered",
  "duration": 4,
  "camera": {
    "keyframes": [
      {"time": 0, "x": -20, "y": 0, "scale": 1.04},
      {"time": 4, "x": 24, "y": 12, "scale": 1.12}
    ],
    "shake": {"start": 2.2, "end": 2.8, "amount": 8, "frequency": 18}
  },
  "layers": []
}

Use camera motion for scene-wide polish. Use layer keyframes when only one object should move.

Shapes And Presets

Shape layers currently include progress_bar, checkbox, arrow, cursor, speech_bubble, file_icon, and window.

Preset layers currently include error_dialog, stamp, meme_caption, file_label, terminal_prompt, form_field, and warning_banner.

Shapes and presets are expanded into ordinary panel/text/media layers. That keeps the public spec compact while preserving deterministic rendering.

Transitions

Transitions include fade, wipeleft, wiperight, slideleft, slideright, circleopen, and circleclose.

Set a default transition at the top level, or override per scene when needed.

Audio

Generated audio includes silence, tone, noise, pulse, and sfx.

SFX presets are deterministic utility cues, not production samples: bonk, error_beep, whoosh, censor_beep, printer_panic, and meow_ish.

If sound is part of the output, run vidkit qa and check levels. A muxed audio stream can still be effectively silent.

Beat Scenes

Beat scenes expand into ordinary generated/layered scenes and are useful for quick comedic timing.

{"type": "beat", "preset": "bonk", "duration": 0.75, "text": "BONK"}

Supported beat presets: hard_cut_card, bonk, censor_meow, zoom_punch, and error_flash.

Good Spec Habits

  • Start from vidkit init <template> <out.json> when possible.
  • Keep public examples free of private paths and private media.
  • Validate before rendering.
  • Use lower-thirds or panels behind small text.
  • Use radius, borders, and opacity intentionally; avoid cluttered layer stacks.
  • Prefer preset motion first, then custom keyframes when the preset is not enough.
  • Inspect frames/contact sheets before calling a visual change finished.