Developer Documentation

Audio settings can sometimes be challenging to navigate, especially for those new to the field. While many options are straightforward, some require specific values to optimize performance.

In environments with high background noise, it’s essential to prevent this noise from being transmitted. To address this, we offer two key settings:

Voice Activity Detection

Our system utilizes a few milliseconds of audio to ascertain if the user is speaking by leveraging an advanced AI model. It’s recommended to keep this feature enabled.

The AI model operates on a probability scale ranging from 0.0 to 1.0, where 0.0 indicates zero likelihood of voice presence, and 1.0 signifies absolute certainty. Adjust the Attack and Release settings to define the probability thresholds for starting and stopping transmission.

Optimal settings are generally 0.9 for Attack (90% certainty of voice) and 0.8 for Release. We recommend setting the Release value slightly lower than the Attack value, ideally maintaining an offset of 0.1.

While effective in many scenarios, certain situations, like a bustling open-office, may require additional measures. In these cases, the AI might confidently detect voice, but it may not be the voice of the intended speaker.

Here, the Volume Gate filter proves useful:

Volume Gate

The Volume Gate operates by setting a volume threshold below which the microphone remains disabled. This helps distinguish between actual speech and lower-volume background noises. The gate activates only when the detected sound exceeds this threshold.

Understanding this concept might be initially complex. A useful resource is Wikipedia on DBFS.

It’s generally advisable to disable this feature initially. A starting points for settings could be -40 for Release and -30 for Attack. We recommend setting the Release value slightly lower than the Attack value, ideally maintaining an offset of 10. For further assistance, please feel free to contact our support team.