Home | 2 January 2017

The missing pieces of English pronunciation lesson

I’ve been thinking about my English pronunciation issue since I’ve moved to Bay Area (from Thailand) in 2012. I can’t seem to speak English properly.

There were many moments that made me feel bad about myself.

I can’t pronounce the word guitar. I don’t get it, and people don’t understand me when I pronounce this word. I once said guitar to a lady at the post office. She didn’t get it. I pronounced to her like several times in different stress patterns. She was very patient… but she still didn’t get. Eventually, I spelled it out for her. She got it and pronounced guitar back to me. I felt like that was exactly how I pronounced it.

Nil is another word. One of my friends at Twitter said that there was no L in my pronunciation. I didn’t understand what he said at the time. But now I understand. Nil is more like Ni-ler, and ler is pronounced in nanoseconds (even shorter than a schwa).

It’s not just pronunciation. Rhythm is another important area. It’s quite hard to get it right.

Well, I’ve been taking a one-and-a-half hours class every week for the last 3 months. There were a few mind-blowing lessons that I never learned in my English class in Thailand. So, I want to share all of them. Here they are:

What stressed syllable really means

Stressing syllable simply means the pitch change is higher, the syllable takes longer, or the pitch change from the previous syllable is the highest one.

I didn’t actually know this before. I thought stressing only meant higher pitch.

I still can’t recognise the stressed syllable though. For example, the verb invite stresses on the second syllable. From what I hear, both syllables have the same pitch, the same length, and the same pitch. But invite is a good example to learn because the noun invite stresses on the first syllable. For the noun invite, I actually can hear that the first syllable has the higher pitch and is a bit higher pitch.

Reduced syllables

One what-the-hell rule is that It doesn’t matter what the spelling is. A reduced syllable always disappears or is reduced to a schwa. For example:

The reason we say ba-live (and not bi-live) because it’s easier to pronounce ba with shorter period of time.

An explanation for reduced syllables is that English has been spoken for a long time, and people are lazy. If a syllable is reduced/dropped without causing ambiguity, it’ll likely to be dropped.

Even I know the theory and think I pronounce it right. My English teacher keeps saying I need to reduce my reduced sound even more.


Another what-the-hell rule is off-gliding. When off-gliding happens, the first syllable of the second word is pronounced differently. For example:

An explanation for off-gliding is that it’s easier to pronounce with off-gliding considering the tongue position at the end of the first word. And it doesn’t create ambiguity. So, people are naturally inclined to do off-gliding.

In general, I think, speaking will keep evolving to be easier for our tongue movement (as long as it doesn’t increase ambiguity).

Difficult sounds

V and Z are called voiced sounds, and they are very difficult to produce. Because we need to vibrate our vocal cords. When I was told I needed to vibrate my vocal cord, I was like “what does that even mean?”.

TH needs the correct tongue/mouth position with correct tongue/mouth movement.

I am getting better with these sounds. If I really focus, I can produce them correctly. Saying them in a sentence is still pretty difficult for me.

Grammar is important to rhythm

My English teacher also keeps saying that grammar is important to get the rhythm right. Speaking is like singing. There’s a beat to it. If we skip a word, and, thus, speak a grammatically incorrect sentence, the rhythm of the sentence will sound strange.

Every word in a sentence shouldn’t have the same pitch

My English teacher keeps saying that the pitch should change on every word. And focused words should have the highest pitch. Focused words are critical words in the sentence (e.g. verb or negative); that kind of makes sense. Here’s one template for a sentence:

We[2] jump[1] up[5] and[4] step[3] down[2] again[1]

The number represents the pitch level (whatever that means). I am not exactly sure what pitches are. It’s hard for me to discern the difference in different pitches.

Almost always pitch downward

One of my weirdness is that I like to pitch upward. I pitch upward. For example, my okay sounds like o[1]-kay[2]. My English teacher says that it sounds strange, and I should pronounce o[2]-kay[1].

Again, it’s hard for me to discern the difference in different pitches.


The gist of my problem is that my tongue/jaw muscle is not developed enough to switch between sounds quickly. And I use a huge effort to pronounce some sounds properly (e.g. v, th, l). My ear and brain are not trained to recognise the nuances in pronunciation (e.g. different pitches).

Also, I don’t understand English culturally. I’d say one will never fully understand others’ cultures. An example of a cultural understanding: if we give a totally novel word to different native speakers (from the same culture), it’s likely that they will come up the same stress pattern. But I would come up with a different stress pattern.

Now if you are like me, you believe that our skill is as good as the amount of time practicing that skill. In order to match the fluency of native English speaker, we might need ~20 years of speaking English and being immersed in one culture, you know, kinda like how a 20-year-old native English speaker has spent his/her first 20 years speaking English and being immersed in his/her own culture. So, I don’t think my speaking skill will never match the level of a native speaker.