220A Final Project

220A Final Project

CCRMA, Department of Music, Stanford University

Outline

1. Analyze husky voice and normal voice using Praat.

2. Extract 12-dimentional chromagram from the husky voice samples and normal voice samples using matlab.

3. Training SOM(Self Organizing Map) with the extracted chromagram.

Motivation

Several years ago, I read an article about the relationship between tone-deaf and his/her parents' voice. They said that if one of his/her parents has husky voice, he/she turns easily tone-deaf. So I tried to compare husky voice to normal voice.

Using tools

1. Praat, phonetics tools that can speech analysis, synthesis and manipulation.

2. matlab

3. ESOM(Emergent Self Organizing Maps)

What I did

The first task was to analyze normal voice and husky voice using Praat. From the two voices, I selected a short part (Ah- sound) which showed stable pitch D. The spectrograms of the two voices are listed below.

As you see, whereas normal voice shows harmonics, husky voice shows a kind of randomness in frequency domain. This phenomenon is called diplophonia which is a condition whereby the vocal cords produce more than one note at the same time.

Secondly, I obtained chromagram of the voice. It was done by matlab coding. To detect chroma pitch, I used Fujishima' s Pitch Class Profile(PCP) algorithm. For example, in C3 sound, there exists the harmonics of C3.

In this example, there are four C keys – C3, C4, C5, C6. PCP sums these 4 keys intensity. In this way, with a sound source, PCP calculates the intensity of 12 chromas from C to B. It is called chromagram. I think that the chromagram of husky voice should be different from normal voice. When I put the voice files to matlab PCP program, it was able to get these results.

These charts show the difference between normal voice and husky voice. The normal voice shows harmonic relationship; key A is the 3^rd harmonics of key D, so there is a peak at A in the chromagram. However, in the chromagram of husky voice, we cannot find this relationship because of diplophonia; the intensity is quite flat except the original key D.

Thirdly, I trained ESOM by these chromagram from C to B.

Before the experimental, I expected that I could see the perfect circle of 5^th with the normal voice. But the result was somewhat different. As you see below, there are two-circle of 5^th.

With the husky voice, you cannot see circle of 5^th. With these results, I cannot find any relevance between husky voice and tone-deaf. However, I think this result can be a starting point and I am going to study more about this subject.

Sound samples that I used

Original husky voice

Ah- sound clip from husky voice

Ah- sound clip from normal voice

Matlab files for Pitch Class Profile