In the previous posts of this series we discussed the challenges of echo cancellation algorithms with respect to:
In this post we will discuss a common implementation issue we have seen many times – audio stream synchronization.
It takes two to Tango
In order to perform echo cancellation, the AEC algorithm needs to constantly receive two streams of audio:
- The steam of audio that is about to be rendered to the speakers (loud speaker) and later on this audio will return back as an echo. This stream of audio is used by algorithm as a reference signal.
- The stream of audio that was captured from the microphone. This stream of audio contains the voice of the person and the echo. The algorithm cleans the echo from this captured signal using the reference signal.
There are two common mistakes by developers when providing the above two streams of audio.
Audio appears first in the captured signal
In the real world the audio appears first in the reference signal and only afterwards it re-appears (as echo) in the captured signal. Unfortunately, due to programming errors and wrong prioritization of processing the two streams of signals, we have seen many cases where the algorithm first received the echo in the captured signal and only afterward receives the relevant reference signal. When such behavior occurs, the algorithm will not cancel the echo because it considers it as a legitimate part of the call as there was no reference indicating that this is an echo.
Usually the two streams of audio are not fully synchronized. At any given point in time the AEC algorithm might not have equal number of frames from both streams. This is the common scenario in non embedded application and professional AEC algorithms should be equipped with mechanisms to handle it. But, there are extreme cases where the non-synchronized signals might impact performance if not handled correctly, for example when there is an ongoing increasing drift. There can be many reasons that cause this effect – like VAD in the system. In any such case, the system behavior should be analyzed and the best configuration/optimization should be set in the AEC algorithm.
You need to keep track of the synchronization and its behavior over time. You need to configure the AEC to match your system behavior in order to achieve best results.
For monitoring and recommendation purpose, SoliCall’s acoustic echo cancellation software includes a debug version that provides a simple way of monitoring and automatic analysis of key synchronization parameter.
[EDITED] Additional information on this subject can be found here.