Auto-ducking
When speech is detected, the background track is automatically lowered by -12 dB so narration stays clear.
What is ducking?
Ducking comes from radio: the DJ's mic lowers the music whenever they speak. Bom's editor uses whisper.cpp VAD (Voice Activity Detection) to find voice segments and a sidechain compressor to drop the music.
- No manual keyframes required.
- Multiple voice tracks are handled per-source.
- Detection is visualised as a volume envelope on the track.
Enable / disable
- Select the background music track.
- Toggle "Auto-Ducking" in the right inspector.
- In the "Source" dropdown, pick which voice track to listen to (default: all voice tracks).
Global default
Settings → Video Editor lets you set the default ON/OFF state for new projects.
Parameters
- Threshold (default -30 dB) — minimum volume considered "voice"
- Attack (default 50 ms) — how fast the music drops when voice begins
- Release (default 250 ms) — how fast it recovers after voice ends
- Reduction (default -12 dB) — how much to drop
- Hold (default 200 ms) — ignore short gaps shorter than this
Recommended
Narration documentary: Reduction -10 dB / Release 400 ms. Fast talk show: Reduction -15 dB / Release 150 ms.
Manual override
To tweak a specific region or disable ducking in one section, right-click the track → "Convert to keyframes". The auto envelope becomes editable keyframes.
Limits & caveats
- Background noise (HVAC, traffic) may be mistaken for voice. Raise the Threshold.
- Song lyrics also count as voice. Disable auto-ducking when scoring over vocal music.
- Pauses under 300 ms (breaths, fillers) are ignored.
- Detection accuracy drops below 16 kHz sample rate.