Can computers generate language learning exercises?

1’

Lionel Nicolas

25 October 2023

1’

Can computers generate language learning exercises?

In an era where chatbots like ChatGPT can converse or generate programming code similarly to humans, one would think that nowadays technologies should be capable of generating language learning exercises. Surprisingly enough, we are not there yet.

Nowadays, computers can indeed generate texts that resemble well-known language learning exercises. Nonetheless, such texts will not have, in many cases, the same quality as regular, carefully curated exercises. Undoubtedly, in the context of language learning - like for most kinds of learning - it is paramount that the knowledge imparted is devoid of errors. With respect to language learning exercises, the recommended solutions to the exercises should be perfect, and mistakes should be the exception. Due to this quality criterion, nowadays, technologies fall short of automatically generating many types of exercises. Allow me to explain why.

On an abstract level, this is because all languages are ambiguous in many ways. An excellent example of that is the numerous misunderstandings that can happen when people discuss, even though humans are used to constantly dealing with all kinds of ambiguity. Inversely, there is no ambiguity in the way computers function and in the tasks they perform. Computers are unambiguous machines by design and are thus poorly equipped to deal with ambiguous tasks. Technologies, such as automatic translation systems or conversational bots, have recently greatly improved on that front thanks to a method called “neural networks” that is directly inspired by how the human brain functions. Put differently, recent improvements in dealing with ambiguity were achieved by mimicking how humans think. If computers had been capable of dealing with ambiguous tasks from the start, satisfying solutions for generating language learning exercises and many other tasks, such as automatic translations, would have been around for decades.

On a more concrete level, let us divide exercises into two categories. The first and smaller category is the one for which algorithms are already doing an acceptable job. These exercises are often found on dedicated websites and apps. For example, exercises asking learners to fill a gap in a sentence or exercises containing crosswords, for which the ambiguity of languages is more easily contained. The second and far bigger category of exercises is composed of the types of exercises you will find in textbooks but more rarely in online websites or dedicated apps. For example, vocabulary exercises in which learners must know the senses of words (e.g., one asking “Yellow is to banana what green is to...”). In many cases, such exercises could indeed be generated with existing technologies. Nonetheless, just like when translation systems output incorrect translations or when chatbots hallucinate and invent facts, the close-to-perfect quality of the generated exercises cannot be ensured. Since this criterion is mandatory for language learning, this second category is not covered yet, even though solutions are being explored.

One such solution (that I am very interested in) is related to crowdsourcing, an approach in which large amounts of information are collected from a crowd of people. This approach is rather sophisticated (and somewhat counter-intuitive), so discussing it would be better suited in another post. So, for now, I’ll just mention its core idea: it consists of recording and combining the answers of learners to deduce (linguistic) information, allowing computers to gradually understand better the languages taught. Duolingo did exactly that in the past to collect translations. Ironically, with such an approach, computers would actually be learning from the learners themselves.

Anyway, to sum up my position regarding the question “Can computers generate language learning exercises?”, my answer is the German word “Yein”, which simultaneously means “yes and no” and is an optimal example of linguistic ambiguity that we humans have no problem dealing with!