Synthesizing the Vocal Textures of Female Pop Singers

Dec 2012


C++ Synthesis Tool Kit (STK), MATLAB


  • To digitally synthesize the vocal fry (pulse) register, a vocal technique frequently exploited in pop music, and sometimes called “creaky voice.
  • To understand the acoustics of the singing voice, particularly its colorful textures, phonations, registers and styles.


  • The Liljencrantz-Fant model was adapted to a “best guess” estimate of the derivative glottal excitation waveform that drives the vocal tract during fry. This glottal excitation wave was modeled in MATLAB, and re-formatted for input to the STK objects.
  • The STK class VoicForm (which calls SingWave) creates a train of glottal excitations from the input wave.
  • Outputs were taken for the 6 phonemes aaa (“hay”), ahh (“wand”), eee (“reed”), ihh (“bit”), ohh (“show”), and ooo (“too”).
  • Comparisons were made against self-recorded renditions of vocal fry, and analyzed in MATLAB.
  • This short study permitted an investigation into the acoustic and perceptual aspects of vocal fry. However, it was only an introduction to physical modeling of the most advanced musical instrument.

Future Work

  • More realistic sound – the VoicForm object does not sound particularly natural. Hui Ling Lu’s parametric source-filter model sounds much better to the ear.
  • Parameterized model with GUI – akin to Perry Cook’s SPASM model, a vocal tract model for users to explore phonations, registers and register hysteresis (transitions).
  • A new computer music instrument – this project sparked its inspiration from a popular sample-based vocal synthesizer, Yamaha’s Vocaloid. The Vocaloid is an electronic keyboard of pitches, with additional controls for timbre and phoneme.

Final Report

