Writing is a major part of human communication, and by extension an integral part of how students learn. However, students that are studying English as a second language (English Language Learners) often do not get the chance to polish their writing skills, because schools may not assign problems that feature writing as a main task frequently enough. This creates a bias against English Language Learners as existing tools do not incorporate English language proficiency into calculating feedback. By sensitizing automated graders to the language levels of such learners, it will help instructors assign work that is more fair and provide students feedback that will be a better reflection of their knowledge removed from the biases placed against them because of their origin or prior access to English as a language.
The main question we wish to solve is how we can integrate language competency into Deep Learning models to accomodate for differences in English Language Learners. This will help educators be more confident in relying on automatic graders knowing that they will evaluate a student's work fairly, considering knowledge and competent arguments regardless of language proficiency. Our idea of a deep learning model will take inspiration from the work of Yang et al. (2020), where an enhanced scoring model was used by combining a fine-tuned pre-trained BERT model with regression [1]. We will attempt to identify areas where we can incorporate material context to prevent language bias. We will also potentially visualize the self attention weights to see what words may impact prediction and whether there are certain trends that may skew it against ELLs.
The dataset we will use is from a Kaggle competition [2] and consists of english essays written by English Language Learners in grades 8 through 12. In the training set, each essay is graded according to six measures which we will have to account for. These measures include cohesion, syntax, vocabulary, phraseology, grammar and conventions.
Automated essay scoring (AES) is a computer-based assessment system that automatically scores or grades the student responses by considering appropriate features. The AES research started in 1966 with the Project Essay Grader (PEG) by Ajay et al. (1973). PEG evaluates the writing characteristics such as grammar, diction, construction, etc., to grade the essay. A modified version of the PEG by Shermis et al. (2001) was released, which focuses on grammar checking with a correlation between human evaluators and the system. Foltz et al. (1999) introduced an Intelligent Essay Assessor(IEA) by evaluating content using latent semantic analysis to produce an overall score. Powers et al. (2002) proposed E-rater and Intellimetric by Rudner et al. (2006) and Bayesian Essay Test Scoring System (BESTY) by Rudner and Liang (2002), these systems use natural language processing (NLP) techniques that focus on style and content to obtain the score of an essay. The vast majority of the essay scoring systems in the 1990s followed traditional approaches like pattern matching and a statistical-based approach. Since the last decade, the essay grading systems started using regression-based and natural language processing techniques. AES systems like Dong et al. (2017) and others developed from 2014 used deep learning techniques, inducing syntactic and semantic features resulting in better results than earlier systems.
Klebanov et al. (2020). Reviewed 50 years of AES systems, listed and categorized all essential features that need to be extracted from essays.
[1] Yang, Ruosong, Jiannong Cao, Zhiyuan Wen, Youzheng Wu, and Xiaodong He. "Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking." In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1560-1569. 2020
[2] Feedback prize - english language learning. Kaggle. (n.d.). Retrieved September 29, 2022, from https://www.kaggle.com/competitions/feedback-prize-english-language-learning/overview/description
[3] Kumar, V., & Boulanger, D. (1AD, January 1). Explainable automated essay scoring: Deep Learning really has pedagogical value. Frontiers. Retrieved September 29, 2022, from https://www.frontiersin.org/articles/10.3389/feduc.2020.572367/full
[4] Bonthu, S., Rama Sree, S., & Krishna Prasad, M. H. M. (1970, January 1). Automated short answer grading using Deep learning: A survey. SpringerLink. Retrieved September 29, 2022, from https://link.springer.com/chapter/10.1007/978-3-030-84060-0_5
[5] Ramesh, D., & Sanampudi, S. K. (2021, September 23). An automated essay scoring systems: A systematic literature review - artificial intelligence review. SpringerLink. Retrieved September 29, 2022, from https://link.springer.com/article/10.1007/s10462-021-10068-2#:~:text=Automated%20essay%20scoring%20(AES)%20is,(1973).
1. Tanmay Khokle
2. Neha Yadav