Auto-ducking

When speech is detected, the background track is automatically lowered by -12 dB so narration stays clear.

What is ducking?

Ducking comes from radio: the DJ's mic lowers the music whenever they speak. Bom's editor uses whisper.cpp VAD (Voice Activity Detection) to find voice segments and a sidechain compressor to drop the music.

  • No manual keyframes required.
  • Multiple voice tracks are handled per-source.
  • Detection is visualised as a volume envelope on the track.

Enable / disable

  1. Select the background music track.
  2. Toggle "Auto-Ducking" in the right inspector.
  3. In the "Source" dropdown, pick which voice track to listen to (default: all voice tracks).
Global default

Settings → Video Editor lets you set the default ON/OFF state for new projects.

Parameters

  • Threshold (default -30 dB) — minimum volume considered "voice"
  • Attack (default 50 ms) — how fast the music drops when voice begins
  • Release (default 250 ms) — how fast it recovers after voice ends
  • Reduction (default -12 dB) — how much to drop
  • Hold (default 200 ms) — ignore short gaps shorter than this
Recommended

Narration documentary: Reduction -10 dB / Release 400 ms. Fast talk show: Reduction -15 dB / Release 150 ms.

Manual override

To tweak a specific region or disable ducking in one section, right-click the track → "Convert to keyframes". The auto envelope becomes editable keyframes.

Limits & caveats

  • Background noise (HVAC, traffic) may be mistaken for voice. Raise the Threshold.
  • Song lyrics also count as voice. Disable auto-ducking when scoring over vocal music.
  • Pauses under 300 ms (breaths, fillers) are ignored.
  • Detection accuracy drops below 16 kHz sample rate.