André Monforte

← Back to projects

Speech Performance Research (2021)

FEP U. Porto – Economics and Management, 2021

Cooperation with Defined.ai

Impact of Vocal Traits Distribution on Speech Applications' Performance and Bias

Machine learning models often inherit biases from their training data, creating systematic errors against certain social groups. In speech applications, these biases particularly affect women, elderly users, and underrepresented ethnicities. My research explores a novel approach to solving this problem by focusing on vocal traits rather than demographic categories.

Gender-based approach showing distribution of male and female speakers

Traditional approach: Balancing datasets using binary gender categories

The Problem with Gender-Based Balancing

The conventional approach relies on balancing datasets by gender, but this method has significant limitations. Gender proxies don't capture the full spectrum of vocal diversity, are difficult to validate in crowdsourced data, and can perpetuate social stereotypes about what voices "should" sound like.

A Better Solution: Vocal Trait Balancing

Instead of using social categories, I proposed focusing on actual vocal characteristics like pitch and spectral centroid to drive the data collection process. This approach proved to be more verifiable, effective, and ethical.

Pitch-balanced approach showing distribution based on vocal traits

Proposed approach: Balancing datasets using measurable vocal traits

Key Findings

When compared to traditional gender-based methods, the vocal trait approach:

  • Improved performance by two percentage points across the board
  • Reduced bias across both gender and age groups
  • Provided a more ethical foundation for speech technology using fact-based representations

Check below for the complete thesis, and my presentation: