André Monforte

← Back to projects

Background Noise Classifier

defined.ai · 2023
Machine Learning · Audio Processing · XGBoost

Overview

The DC Noise Predictor is a machine learning service I developed to classify background noise in audio files, with a particular focus on dialogue data. This service improves upon the traditional Signal-to-Noise Ratio (SNR) metric by combining it with other acoustic features to achieve better performance specifically for dialogue recordings.

Problem Statement

In audio processing, particularly for dialogue recordings, accurately identifying background noise levels is crucial for quality assessment and downstream processing. Traditional metrics like Signal-to-Noise Ratio (SNR) alone are insufficient for dialogue data, which has unique characteristics compared to other audio types.

The challenge was to create a more sophisticated system that could:

Solution Architecture

I designed and implemented the service as a Nuclio serverless function deployed to a Kubernetes cluster. This architecture provides scalability, reliability, and efficient resource utilization. The service exposes an HTTP endpoint that accepts audio URLs and returns detailed noise classification results.

The system consists of four main components:

Technical Implementation

The service was implemented using Python with several specialized libraries:

API Design

The service exposes a clean, simple API that accepts audio URLs and returns detailed classification results:

Request Format

{
  "inputs": {
    "audioUrl": "https://example.com/audio-file.wav"
  }
}

Response Format

{
  "channel_1": {
    "predictedBackgroundNoise": "noisy",
    "noisyProbability": 0.875,
    "silentProbability": 0.125
  },
  "channel_2": {
    "predictedBackgroundNoise": "silent",
    "noisyProbability": 0.123,
    "silentProbability": 0.877
  }
}

Model Training

I trained the model using a dataset of labeled audio files with known background noise characteristics. The training process involved:

The final model achieved impressive performance metrics:

Deployment

The service was deployed as a Nuclio function in a Kubernetes environment. The deployment configuration defined in the function.yaml file specified:

This serverless approach allowed for efficient scaling based on demand while minimizing resource usage during idle periods.

Challenges and Solutions

During development, I encountered several challenges:

Results and Impact

The DC Noise Predictor service significantly improved the accuracy of background noise classification compared to traditional SNR-only approaches. This enhanced classification enabled:

Limitations and Future Work

While the service performs well for its intended purpose, there are some limitations and areas for future improvement:

Conclusion

The DC Noise Predictor service demonstrates how combining traditional audio metrics with machine learning can create more accurate and useful audio classification systems. By focusing specifically on the unique characteristics of dialogue recordings, the service provides valuable insights that traditional metrics alone cannot capture.