What Words Should I Learn First When Learning a Language?

What Words Should I Learn First When Learning a Language?

** UPDATE **

I have added several languages to the chart as well as increased the base word list.  This chart is far from comprehensive.  It was originally just something done for myself, wanting to take an available resource and exploit it for my own learning benefit.  Now, I am opening it up to anyone who is qualified to help correct the erroneous Google Translations for each language so that this can become a valuable resource to everyone.  Please email me at info@stujay.com if you would like to be a contributing editor. The complete language list now is: English (en), Thai (th), Lao (lo), Khmer (km), Myanmar (my), Hindi (hi), Urdu (ur), Chinese Trad (zh-tw), Chinese Simp (zh-CN), Chinese Pinyin, Cantonese Jyutping, Vietnamese (vi), Japanese (ja), Korean (ko), Latin (la), Spanish (es), Portuguese (pt), Italian (it), French (fr), Russian (ru), Polish (pl), Hungarian (hu), Finnish (fi), Indonesian (id), Javanese (jw), Sundanese (su), Tagalog (tl), Danish (da), Norwegian (no), Swedish (sv), Farsi (fa), Arabic (ar).

How Should You Start Learning A Language?

When you learn a language, should you be learning individual words from graded word lists, or should you be learning full sentences with words in context? I would say – both … and more.

I have been enjoying reading Gabriel Wyner’s book ‘Fluency Forever’. Such titles always raise a certain degree of skepticism, however as I read through the book I was happy to see that Wyner provided some very sound advice for language learners and made clear his definition of ‘fluency’:

“I would confidently describe myself as fluent in German. I’ve lived in Austria for  six years and will happily discuss anything with anyone, but I certainly needed to dance around a few missing words to get out of a €200 fine for my rental car’s broken gas cap.”

Wyner continues throughout the book to provide some very common sense ideas and tools that anyone can use to make the goal of becoming ‘fluent’ in a language much more realistic. He covers the power of the mind and mnemonics, different memory and flash card systems both digital and manual, as well as going into the mechanics of pronunciation and transcription systems like the International Phonetic Alphabet (IPA) which put a smile on my face.


Wyner’s ‘First 625 Words’ Word List

Before you go any further, check out this table that I have created in Google Sheets –


There is one section of the book however that I found to be a great resource and is one that I have built a little on and would like to share with you here.  Wyner has compiled a list of 625 words and provides them both in the book as well as on his website for free in both alphabetical order and in thematic groups. Wynder says that he has culled the list from two main sources – WordFrequency.info and the General Service List.

When I was young, my grandfather introduced me to the 850 words of Basic English compiled by linguist Charles Kay Ogden in 1930.  Ogden’s idea was that with these 850 words anything could be communicated in English, even if it meant having to explain around something to get to the final meaning.  The list has some valuable words and concepts in it, the words are broken down into very broad categories – operations (100 words), general words (400 words), things (200 words), qualities (100 words) qualities – opposites (50 words).  

The thing that I liked about Wyner’s list is that it broke the words into relatively short lists of thematic groups that are easy to attach visual and emotional links to. The main thing for me is that the terms will be relevant, functional and in groups that are easy to tag and remember.

Wyner goes into detail in his book about how the word lists should be used – how to create memorable mnemonics, how to make flash cards both digitally and manually, how to handle languages with gender and rules of declension / conjugation, as well as how to find these words in context.

There are many words that I would still say need to be added to the list – and I’m sure over time as this will be an ongoing project, words will be added and changed etc.  This whole exercise however is an example of how to take a learning resource that’s available and exploit it to one’s own language learning advantage.

Why Learn Individual Words Out of Context?

I believe that unless you learn to speak in full phrases and use word in context, you will never develop a facility in a new language with natural sounding prosody to a native speaker of that language.  You might be able to string words together to create a specific meaning, however even though you might produce something that is grammatically correct, native speakers may be pushed out of their comfort zones trying to decipher what you mean.  That doesn’t mean however that learning word lists is completely useless.  Especially in the beginning stages, getting a foundation of ‘meaning building blocks’ as I refer to them in my book will enhance the chances of comprehensive input when receiving native speaker input whether that be through spaced repetition systems like Glossika, reading a newspaper, watching television or just listening to a conversation between two native speakers on a train.

For me other benefits of learning word lists include:

  • The ability to compare sound shifts between related languages
  • The ability to identify patterns – sound patterns, spelling patterns, semantic patterns etc.
  • The ability to link words under a similar thematic topic to one ‘mother’ image or mnemonic
  • Identify words that you might have thought meant one thing but actually meant something else
  • Identify words that have multiple meanings in different contexts
  • Learn a high volume of semantic building blocks in one blast

From now on in, I am going to take you on MY journey of how I go about making the most of such lists. Come along for the ride.  We all have different goals and different backgrounds.   Hopefully some of the techniques that I use will resound with you and help you get closer to your language learning goals.

Using Google as a Vocabulary Learning Tool

I have taken Wyner’s list of 625 words and extended them a little to add some terms that from my experience are relevant to languages of Southeast Asia. Some of these include ‘wet season’ and ‘dry season’ rather than the standard four seasons of Summer, Autumn, Winter and Spring.

I wanted to create a list of these words in languages that would be relevant to me.

It looks like over the next year, I will be doing a lot more training and facilitation in Laos, Vietnam, Cambodia and Burma. I originally put this table together for myself as I want to really ramp up my fluency in Burmese, Khmer and Vietnamese and bring them up to par with my Thai. Learning Burmese, Khmer and Vietnamese has come quite easily to me so far mainly because of all the links between languages that I already know like Thai, Indonesian, Mandarin, Cantonese, Middle / Old Chinese, Sanskrit / Pali and other Indian languages.  I rarely find a word that doesn’t have a link with one of those languages, and when there is a real difference, the fact that it’s different works as a memory peg for that meaning.

I then figured if I had Hindi there, then I should have Urdu too… and then if I included Urdu I should include Farsi. With Farsi in there, then Arabic will be useful as it is a language that I’m extremely weak in, however I have a good foundation of Arabic roots from Farsi, Urdu and Indonesian.  Spanish also has many Arabic cognates, so I added Spanish to the mix.  I couldn’t add Spanish and leave out Italian and French. Now that I had many Latin languages covered, I figured that I should add other European languages that I speak like Danish. With Danish there I couldn’t leave out Norwegian and Swedish.  I could have kept on going but I had to draw the line somewhere. Below is a list of all the languages I have included in this spreadsheet. I have included the GOOGLE 2-letter language code for you too, as you will be able to use this to use Google Translate functions yourself either in Google Sheets, via the Command Line through their Command Line Interface (CLI) or through other Google Translate API based tools. The Google 2-Letter codes and the standard ISO 2-Letter codes sometimes differ.

I have included an explanation of why I included each language in there and what I would be looking out for with that language. Even if you aren’t learning certain languages in the list, read my remarks, as some of the connections that I make here might help you with the language that you ARE learning – and who knows … you might even be inspired to go on to learn a languages that you hadn’t previously thought about learning.


Language Google Lang Code Explanation
English en This is the base language for all the translations.
Thai th Thai is one of my very strong languages that I will use as an alternative comparison stick against Khmer and even Burmese.
Lao lo I’ve included Lao because it’s always nice to see the sound shifts and meaning shifts between the standard two languages.  The differences are often like the differences between ‘lorry’ and ‘truck’ in English.  Lao will often use a traditional ‘Tai’ word for certain terms that will resemble terms in Burmese and even Vietnamese where Thai has taken the Sanskrit term.
Khmer km One of my goals is to build fluency in Khmer.  Having the Khmer very close to Thai in the table lets me draw the many cognates between the two.  I can already read the Khmer script, so I haven’t included any phonetic transcription.  Also, I tried to create this chart as painlessly as possible and use as many automated Google Functions that I could without having to manually enter any data.  
Myanmar (Burmese) my One of my goals is to build fluency in Burmese.  I had it close to Thai and Hindi in the spreadsheet so that I could recognise cognates from Sanskrit. I can already read Burmese.  Google’s Burmese functionality is very rudimentary, but it’s better than nothing.  It was only a few months ago where I couldn’t even view Burmese fonts on my phone without rooting it. Since the digital age, a hybrid character encoding system called Zawgyi became popular which was only partially compatible with Unicode.  There is a big push now to move everything into Unicode, and Google will hopefully be a big player in that push, providing language tools and support in browsers and mobile devices.
Hindi hi I included Hindi as Google doesn’t have a ‘Sanskrit’ option and Hindi is one language that has many Sanskrit root words that keep their original spelling in Devanagari which could be used as a comparison against Thai, Burmese and Khmer.
Urdu ur I included Urdu because my reading of Urdu in the Nastaliq script isn’t as fast as my reading of Hindi in the Devanagari script. I am sometimes confused when speaking Urdu which term to use – as the line between Hindi and Urdu is more political than linguistic (please don’t shoot me).  I also included Urdu as I would like to see just how many terms have come in from Farsi (Persian) and to see how many of those terms are actually originally Arabic terms.
Mandarin (Traditional) zh-tw I have included Mandarin here for a number of reasons.  Google Translate does not have Cantonese.  Ideally, I would like to include a Southern dialect like Cantonese or even Hokkien as the phonetics and vocabulary used would much more resemble the words in languages like Vietnamese, Korean and Japanese that have sinitic influence / roots.
Even so, having the Mandarin terms there give some good hints as to where certain Vietnamese terms have come from – especially if you understand some of the basic sound shifts that have taken place.  Eg. – Palatal sounds like ‘c’ in Mandarin often become velar in Cantonese and Vietnamese.
Mandarin (Simplified) zh-cn This is useful for people who know either the traditional or simplified version of Chinese characters and want to master the other form.
Mandarin Pinyin *Pinyin I have included a Pinyin column here because it was easy too.  Google sheets has a function that allows Pinyin pronunciation to be automated. That function is used like this: =HANYUPINYIN_TONEMARKS_ISO(Chinese Character)
Cantonese Jyutping *Jyutping Even though Google doesn’t yet have Cantonese, there exists an add-on for Google sheets that will give you the Cantonese pronunciation in ‘Jyutping’ form.  This is useful as even though Cantonese will generally use different words or different order of words, people wanting to learn the Cantonese pronunciation of words that they already know in Mandarin can use this as a guide.  It’s also useful to me as it highlights similarities between Vietnamese and Chinese that I might have missed thinking about the Chinese characters in Mandarin.
Vietnamese vi Vietnamese is a fascinating language.  I find that with a solid grounding in Chinese – and especially with Cantonese and having studied Middle and Old Chinese, I recognise many words in Vietnamese.  Vietnamese traditionally used Chinese characters to write the language.  For words that didn’t have a direct Chinese link, a set of ‘special’ characters had been developed especially for so called ‘Vietnamese’ words.  These characters are called ?喃 Chữ Nôm.
For anyone wanting to learn Vietnamese who already has a foundation in Chinese characters whether through Chinese or Japanese Kanji or Korean Hanja, learning Vietnamese with the aid of Chữ Nôm characters can really accelerate your learning.  You get to see the meaning and the approximate sound of the Vietnamese word in each character rather than trying to figure it out through a more ambiguous romanised script – chữ Quốc ngữ.
I bought a Chữ Nôm dictionary a few years back in Hanoi and at the time it was like finding the Rosetta stone for Vietnamese.  I noticed that certain words in Vietnamese that had their own special Chữ Nôm character were actually cognates from old or middle Chinese that may have sounded a little ‘too’ different so were considered ‘not Chinese’ and warranted a new character.  In some cases there are two words with slightly different pronunciations for the same meaning.  One will be the official original ‘Chinese’ character introduced from Chinese and one will be the official ‘Vietnamese’ word with a Chữ Nôm character.  If you really look at them however, you can see that they have both actually come from the same original word in Old or Middle Chinese.
Some are the other way around.  They will use the same Chinese character but have a Chinese and so-called ‘Vietnamese’ pronunciation.  The Vietnamese pronunciation is in many cases a version of the same original Chinese character that probably entered Vietnamese through a different time and / or place.
To give you an idea of how Chữ Nôm can help Chinese speakers learn Vietnamese, take a look at these:Welcome – Hoan nghênh 歡迎 (Mandarin – Huānyíng)‘You’re Welcome’ – Được tiếp đãi ân cần 得接待恩勤 (Cantonese – dak1 zip3 doi6 jan1 kan4)
The different phonetic systems may make the Vietnamese and Cantonese still look quite different.  Watch what happens when I turn both of these into standard International Phonetic Alphabet IPA:tɯak tiep tai an kan (Vietnamese)tak tʃip toi jan kan (Cantonese)
Through understanding a few rules about sound shifts, having a basic understanding of Chinese characters and a good imagination, creating mnemonics and memory pegs for Vietnamese becomes very easy.  As for the tones, there are also standard shifts that have occurred and patterns can be found.
Spanish es I have a decent grounding in Spanish and want to use this to help me learn both French, and also act as a memory tool for Arabic.
Italian it I have included Italian just to refresh my Italian vocabulary as well as highlight sound shifts between Italian, Spanish and French – and Sanskrit for that matter.
France fr French has always been my Achilles heel.  I have never spent the time to really become proficient in it.  Having this in the mix here might serve as some new motivation and inspiration.
Indonesian id Indonesian is one of my stronger languages and has a mixture of Malay / Austronesian words and Sanskrit.  The links to Sanskrit help with Thai, Lao, Burmese, Khmer, as well as other Indic languages.
The other words help no end in acting as a base point for languages like Javanese, Sundanese, other Indonesian ‘Bahasa Daerah’ or ‘Regional Languages’, as well as other related languages like Tagalog, other languages of the Philippines and even native languages of Taiwan and Polynesia.
Javanese jw Javanese is a beautiful and complex language – and Google should be commended for including it in its languages for Google Translate.  Sadly there is no Javanese script and no real function to distinguish the 5 different levels of speech.  The Javanese translations are still very sketchy too.  It’s still better than nothing and helps in comparative analyses of Austronesian languages.
Sundanese su Ditto for Sundanese.
Tagalog tl Tagalog has some interesting grammatical structures that are quite different from Malay / Indonesian languages, however by using a bit of imagination, you can see the links in many vocab items.  I would like to become much more proficient in Tagalog – so this is also in the mix to inspire me.
Danish da Danish is my strongest of the Scandinavian languages.   I would like to become more fluent in Swedish and Norwegian so have included Danish here as a base.  It’s also interesting to notice that many of the sound shifts between Germanic / Scandinavian languages are the same shifts that happen between Indic languages, Sinitic languages – and languages all over the globe.  
Norwegian no Included so that I can improve my Norwegian – using Danish as the base tool.
Swedish sv Included so that I can improve my Swedish using Danish as the base tool.  Note that the slight (or extreme) differences between words in each language – both in pronunciation and word selection all become memory pegs for each language respectively
Farsi fa I have included Farsi to improve my Farsi as well as draw more links between Farsi and Urdu / Hindi
Arabic ar I have included Arabic to help improve my Arabic, drawing on the links between loaned Arabic words into Farsi, Urdu / Hindi, Indonesian and Spanish.


How to Use the List

You will see that in my sample list that I have created using dirty, nasty, crude automated Google Translate ‘first responder’ translations via Google Sheets’ ‘GOOGLETRANSLATE (“text”, “[FROM LANG CODE]”,”[TO LANG CODE]”)’ function, there are as expected, many errors. This is good.

I spend a day or so going over all the terms.  I have a foundation in all the languages in the spreadsheet, so with that I can start to identify where terms might be erroneous.  I will then go through each term and correct it either from my own knowledge, or try and find it in context on the internet, in a book or through a better dictionary – or better still, through speaking with native speakers.

After the list is refined, I will go through and try and find sentences with these words in context and I will start to create spreadsheets with these sentence pairs between English – or other related languages.  Once I do that, I will create audio files and flash cards.  I have written an article on how to do this here – “How to Build the Ultimate Vocabulary Building Tool”.

I also highly recommend using other spaced repetition systems like Glossika for getting real life language into your system to help you develop natural reactions in language and natural sounding prosody.

Would you Like to Help?

I will keep the original Google Sheet open as a continuing ‘Work In Progress’.  If you are a native speaker – or very fluent speaker of any of the languages in the list and would like to help in the refining process of the translations, please send me a message at info@stujay.com and I will give you editing access to a sheet that contains those terms and you can be part of the editing team.  I will make the list available to anyone – and I’m happy to add any other languages in upon request.

So what are you waiting for? It’s time to GET CRACKING!

Here is an embedded version of the table:


Profile photo of Stuart Jay Raj

Stuart Jay Raj is a polyglot who specializes in the languages and dialects spoken in South East Asia and China. His talents have allowed him to earn a professional living as a simultaneous interpreter in Thai, Mandarin, Cantonese, and Indonesian, among others, providing language and cultural training for multinational companies in the region and hosting his own TV programme on Thailand's Channel 5. He holds a degree in Cognitive and Applied Linguistics from Griffith University and has become an expert in the field of language acquisition with a strong track record of success. Stuart's background knowledge of Sanskrit, Khmer, Lao and various Chinese dialects and minority languages enables him to present a fascinating and unique perspective on the Thai language which makes everything fall logically into place.
  • This is great Stu! Loving the course so far. ขอบคุณมากครับ

  • Wei Chung Sim

    Mouse is 老鼠 in both Traditional Chinese and Simplified Chinese. 鼠标 is the term for mouse (pc component) in Mainland Chinese and generally known as 滑鼠 in Southeast Asian Chinese.

    • …and in English the plural of 老鼠 is mice, while two 鼠标 are two mouses on both the IBM and MIT axes.

      Mrs Grundys are everywhere, but ignore them.