Can BERT be a fussy proofreader?
It’s 11 p.m. There’s nothing outside but a placid wind and some subtle traffic noises. You are a teacher, and on the left side of your messy study desk lies a pile of student essays to correct. It has been a rough day; you are tired, and you wish to go to bed or watch some garbage tv series to shut down your brain. But duty calls and there are deadlines to be met.
Who wouldn’t need a hand in this infamous situation?
AI systems are increasingly impacting our lives in a multitude of domains such as education, mechanical and repetitive tasks, arts and entertainment. Their immense potential extends beyond these fields, particularly into research, where statistical and computational power can be applied to numerous areas. If you recognize yourself in the classic “unlucky teacher” scenario, well… maybe some help from this automatic wizardry is what you’re looking for.
Our Language Technologies team has conducted a study exploring how AI can capture language features in very specific contexts, such as high school education. Inspired by pedagogical challenges, the core question behind our work is straightforward: can an AI system learn to distinguish a correctly written and coherent text from an artificially flawed one?
Before diving into our work and try to answer this question, it is essential to briefly introduce two key tools:
- BERT: One of the most renowned language models, BERT uses text data to predict features of language according to the instructions we provide (generally correct and incorrect examples). Like GPT, BERT belongs to the transformers family – not robots, but highly sophisticated statistical models designed for language representation and generation.
- Data Augmentation: This computational technique intentionally spoils texts in random, often nonsensical ways.
We applied these tools to the ITACA corpus, a collection of real-world student essays built to investigate coherence in the writing of South Tyrolean high school students. The idea here was to automatically classify these texts into coherent/non coherent based on their similarity with a set of carefully spoiled texts. If you check the example below you can see how minute text modification can already be significant enough to label a text as non-coherent.
Original: "Non sono favorevole alla didattica a distanza, perché per quanto possa sembrare più "semplice" e meno stressante per gli studenti italiani, creando meno ansia e agevolando gli studenti nello svolgimento di prove e interrogazioni orali, penso che ormai per gli studenti che hanno vissuto questi ultimi anni con la didattica a distanza sarebbe estremamente svantaggioso ritornare alla didattica in presenza”
Modified: “Non sono favorevole alla didattica a distanza, e e quanto possa sembrare più "semplice" e meno stressante e gli studenti italiani, creando meno ansia e agevolando gli studenti nello svolgimento di prove e interrogazioni orali, penso che ormai e gli studenti che hanno vissuto questi ultimi anni con la didattica a distanza sarebbe estremamente svantaggioso ritornare alla didattica in presenza”
While AI models like BERT are sensitive to explicit language modifications, their performance drops as the modifications become subtler. Now, imagine a poorly written essay with numerous syntactical and stylistic errors. BERT can efficiently identify these glaring mistakes. However, as we begin to correct these errors incrementally, distinguishing between correct and slightly flawed texts becomes increasingly challenging for the AI, especially when we must deal with the large variability of natural texts.
Despite these limitations, AI systems hold promise as effective error-detecting tools, especially for identifying highly specific language phenomena. With further improvements, these systems could become invaluable in educational settings, assisting teachers and students by pinpointing and correcting nuanced language errors. In conclusion, while our study highlights the current limitations of AI in fine-tuning language detection when exposed to the variability of natural texts, it also underscores the potential for growth and development in this area. The performance of the model varies according to the considered language feature, the grade of sporadicity of errors and the amount of knowledge provided to the model. In simpler terms, in average circumstances some features are easier, some still harder to detect. By continuing to refine these systems, we can harness AI to significantly enhance language learning and educational outcomes, and teachers will finally have their coveted daily break.
Tags
Citation
This content is licensed under a Creative Commons Attribution 4.0 International license.