Make ODIN APM Settings

Make ODIN APM Settings

Creates an APM settings object that can be used to construct a room.

Inputs

NameTypeDescription
Enable Voice Activity DetectionbooleanWhen enabled, ODIN will analyze the audio input signal using smart voice detection algorithm to determine the presence of speech. You can define both the probability required to start and stop transmitting.
Attack ProbabilityfloatVoice probability value when the VAD should engage.
Release ProbabilityfloatVoice probability value when the VAD should disengage.
Enable Volume GatebooleanWhen enabled, the volume gate will measure the volume of the input audio signal, thus deciding when a user is speaking loud enough to transmit voice data. You can define both the root mean square power (dBFS) for when the gate should engage and disengage.
Attack Loudness (D BFS)floatRoot mean square power (dBFS) when the volume gate should engage
Release Loudness (D BFS)floatRoot mean square power (dBFS) when the volume gate should disengage
High Pass FilterbooleanWhen enabled, the high-pass filter will remove low-frequency content from the input audio signal, thus making it sound cleaner and more focused.
Pre AmplifierbooleanWhen enabled, the preamplifier will boost the signal of sensitive microphones by taking really weak audio signals and making them louder.
Noise SuppressionenumWhen enbabled, the noise suppressor will remove distracting background noise from the input audio signal. You can control the aggressiveness of the suppression. Increasing the level will reduce the noise level at the expense of a higher speech distortion.
Transient SuppressionbooleanWhen enabled, the transient suppressor will try to detect and attenuate keyboard clicks.

Outputs

NameTypeDescription
Return ValueAPMSettingsThe constructed APM settings object.

Discussion

Some settings might be a bit confusing if you are new to audio. Most settings should be self explanatory but some settings require to enter some values.

If users are in a loud environment they don’t want to send background noise to other team members. To prevent that, we provide too different settings:

Voice Activity Detection

This system captures a few milliseconds of audio data and analyzes it to determine if the user is speaking. We use an advanced AI model for that, so it’s pretty smart in detecting voice. So, you typically should have this enabled.

As with every AI model it only returns a probability between 0.0 and 1.0. 0.0 means that the probability is zero that the recorded audio contains voice. 1 means that the AI is a hundred percent sure it’s voice. Use the Attack and Release values to set the probability when the AI should start and stop transmitting.

Good values are 0.9 (if the AI is 90% sure that it’s voice) for attack and 0.8 release.

That works very good. However, there are some special cases, where additional filtering needs to be implemented. Consider an open space office location with many people talking. The AI might be 100% sure that it’s voice but it’s not clear, that it’s voice of the person connected to the room.

For these cases, an additional filter Volume Gate can be used:

Volume Gate

The basic idea of a volume gate is to have a setting that allows you to disable the microphone before a specific volume threshold. So, the AI has detected voice, but it’s not very load, so it also might be from the background noise. You can apply a loudness threshold that must be met before the microphone is enabled.

It’s not that easy to understand that topic, but a good starting point is Wikipedia on DBFS.

Typically, you should disable that setting. A good starting point is -40 for release and -30 for attack. Please contact support if you need assistance with these values.