FFT Speech Denoising

Series: bsc

Refining My Audio Noise Removal Algorithm

In my last post, I discussed developing an algorithm to eliminate background noise from an audio clip containing speech. It worked by transforming the clip into the frequency domain, removing high frequency components exceeding a set threshold, then converting back into the time domain. This effectively filtered out unwanted noise while retaining the speech.

The core techniques used are the Discrete Fourier Transform (DFT) and its inverse (IDFT). The DFT converts the audio waveform into the frequency domain, allowing me to view and manipulate the distinct frequency components in the signal. After removing high frequency noise peaks, the IDFT converts back into a cleaned time domain waveform.

Additionally, I utilized folding and unfolding procedures before and after filtering to rearrange the frequency bins for easier processing. A key parameter update was using a final threshold value of 2663 Hz to selectively eliminate frequencies while retaining as much speech information as possible.

The overall process demonstrates how transforms like the DFT facilitate complex signal manipulation impossible in the raw time domain. Converting between domains enables targeted noise removal. Fine tuning the threshold balances optimal filtering versus speech quality, which took careful listening tests alongside inspecting frequency spectra plots during each iteration.

There is still room for improvement with more advanced adaptive methods. But the core techniques of frequency domain analysis, selective filtering based on a threshold, and inverse transformation back to time domain audio gave very good background noise removal results. My code serves as an easily adaptable template for others to implement similar processing algorithms.

To see more details, check out the paper

Disclaimer: This project was completed as part of my BSc in Mathematics at Manchester Metropolitan University. The project was supervised by Dr. Jon Borresen. This blog post is an LLM generated text, based upon the hand-written report.