Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Advances in Automated Essay Evaluation: A Review, Slides of Material Science and Technology

This paper reviews the advances in the field of Automated Essay Evaluation (AEE), discussing the motivation, challenges, and benefits of using AEE systems. The authors highlight the role of Natural Language Processing (NLP) methods in developing AEE systems and present several examples of commercial and academic systems. They also discuss the evaluation methods and the future directions of research in this area.

Typology: Slides

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(620)

8.6K documents

1 / 14

Toggle sidebar

Partial preview of the text

Download Advances in Automated Essay Evaluation: A Review and more Slides Material Science and Technology in PDF only on Docsity! Informatica 39 (2015) 383–395 383 Advances in the Field of Automated Essay Evaluation Kaja Zupanc and Zoran Bosnić University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia E-mail: {kaja.zupanc, zoran.bosnic}@fri.uni-lj.si Overview paper Keywords: automated essay evaluation, automated scoring, natural language processing Received: June 16, 2015 Automated essay evaluation represents a practical solution to a time-consuming activity of manual grading of students’ essays. During the last 50 years, many challenges have arisen in the field, including seeking ways to evaluate the semantic content, providing automated feedback, determining validity and reliability of grades and others. In this paper we provide comparison of 21 state-of-the-art approaches for automated essay evaluation and highlight their weaknesses and open challenges in the field. We conclude with the findings that the field has developed to the point where the systems provide meaningful feedback on stu- dents’ writing and represent a useful complement (not replacement) to human scoring. Povzetek: Avtomatsko ocenjevanje esejev predstavlja praktično rešitev za časovno potratno ročno ocenje- vanje študentskih esejev. V zadnjih petdesetih letih so se na področju avtomatskega ocenjevanja esejev pojavili mnogi izzivi, med drugim ocenjevanje semantike besedila, zagotavljanje avtomatske povratne informacije, ugotavljanje veljavnosti in zanesljivosti ocen in ostale. V članku primerjamo 21 aktualnih sistemov za avtomatsko ocenjevanje esejev in izpostavimo njihove slabosti ter odprte probleme na tem področju. Zaključimo z ugotovitvijo, da se je področje razvilo do te mere, da sistemi ponujajo smiselno povratno informacijo in predstavljajo koristno dopolnilo (in ne zamenjavo) k človeškemu ocenjevanju. 1 Introduction Essays are a short literary composition on a particular theme or subject, usually in prose and generally analytic, speculative, or interpretative. Researchers consider essays as the most useful tool to assess learning outcomes. Essays give students an opportunity to demonstrate their range of skills and knowledge, including higher-order thinking skills, such as synthesis and analysis [62]. However, grad- ing students’ essays is a time-consuming, labor-intensive and expensive activity for educational institutions. Since teachers are burdened with hours of grading of written as- signments, they assign less essays, thereby limiting the needed experience to reach the writing goals. This con- tradicts the aim to make students better writers, for which they need to rehearse their skill by writing as much as pos- sible [44]. A practical solution to many problems associated with manual grading is to have an automated system for es- say evaluation. Shermis and Burstein [53] define an auto- mated essay evaluation (AEE) task as the process of evalu- ating and scoring the written prose via computer programs. AEE is a multi-disciplinary field that incorporates research from computer science, cognitive psychology, educational measurement, linguistics, and writing research [54]. Re- searchers from all these fields are contributing to the de- velopment of the field: computer scientists are developing attributes and are implementing AEE systems, writing sci- entists and teachers are providing constructive criticisms to the development, and cognitive psychologists expert opin- ion is considered when modeling the attributes. Psychome- tric evaluations provide crucial information about the reli- ability and validity of the systems, as well. In Figure 1 we illustrate the procedure of automated es- say evaluation. As shown in the figure, most of the exist- ing systems use a substantially large set of prompt-specific essays (i.e. set of essays on the same topic). Expert hu- man graders score these essays with scores e.g. from 1 to 6, to construct the learning set. This set is used to de- velop the scoring model of the AEE system and attune it. Using this scoring model (which is shown as the black box in Figure 1), the AEE system assigns scores to new, ungraded essays. The performance of the scoring model is typically validated by calculating how well the scoring model “replicated” the scores assigned by the human ex- pert graders [18]. Automated essay evaluation has been a real and viable alternative, as well as a complement to human scoring, in the last 50 years. The widespread development of the Inter- net, word processing software, and natural language pro- cessing (NLP) stimulated the later development of AEE systems. Motivation for the research in the field of auto- mated evaluation was first focused on time and cost sav- ings, but in the recent years the focus of the research has moved to development of attributes addressing the writing construct (i.e. various aspects of writing describing “what” 384 Informatica 39 (2015) 383–395 K. Zupanc et al. grade: 4 Scoring model Essays Human graders Graded essays New, ungraded essay New, graded essay grade: 3 Figure 1: Illustration of the automated essay evaluation: A set of essays is pre-scored by human graders and used to develop the scoring model. This scoring model is used to assign the scores to new, ungraded essays. and “how” the students are writing). Researchers are also focusing on providing comprehensive feedback to students, evaluating the semantic content, developing AEE systems for other languages (in addition to English), and increasing the validity and reliability of AEE systems. In this survey we make a comprehensive overview of the latest development in the field. In Section 2 we first de- scribe the reasons and progress of the field development in the last 50 years. Then we present advantages and dis- advantages of AEE systems and provide an overview of open problems in the field in Section 3. Section 4 briefly describes the field of NLP and then overview the existing commercial and publicly-available AEE systems. This is followed by a comparison of those approaches. Section 5 concludes the paper. 2 History Throughout the development of the field, several different names have been used for it interchangeably. The names automated essay scoring (AES) and automated essay grad- ing (AEG) were slowly replaced with the term automated writing evaluation (AWE) or automated essay evaluation (AEE). The term evaluation within the name (AEE) came to use because the systems are expected also to provide feedback about linguistic properties that are related to writ- ing quality, interaction, and altogether wider range of pos- sibilities for software. The first AEE system was proposed almost 50 years ago. In 1966, the former high school English teacher E. Page [44] proposed machine scoring technology and ini- tiated the development of the field. In 1973 [1] he had enough hardware and software at his disposal to imple- ment the first AEE system under the name Project Essay Grade. The first results were characterized as remarkable as the system’s performance had more steady correlation with human graders than the performance of two trained human graders. Despite its impressive success at predict- ing teachers’ essay ratings, the early version of the system received only limited acceptance in writing and education community. By the 1990s, with the widespread of the Internet, natu- ral language processing tools, e-learning systems, and sta- tistical methods, the AEE became a support technology in education. Nowadays, the AEE systems are used in combi- nation with human graders in different high-stakes assess- ments such as the Graduate Record Examination (GRE), Test of English as a Foreign Language (TOEFL), Gradu- ate Management Admissions Test (GMAT), SAT, Amer- ican College Testing (ACT), Test of English for Interna- tional Communication (TOEIC), Analytic Writing Assess- ment (AWA), No Child Left Behind (NCLB) and Pearson Test of English (PTE). Furthermore, some of them also act as a sole grader. E-rater was the first system to be deployed in a high- stakes assessment in 1999 [49]. It provided one of two scores for essays on the writing section of the Graduate Management Admissions Test (GMAT). The second score for each essay was provided by an expert human grader. The term “blended” scoring model [35, 36] for the use of both human and machine scoring for a single assessment program, came to use at the time. 3 Challenges in the field of automated essay evaluation In addition to savings of time and money, AEE systems provide higher degree of feedback tractability and score logic for a specific response Feedback for each specific re- sponse provides information on quality of different aspects of writing, as partial score as well as descriptive feedback. Their constant availability for scoring gives a possibility to students to repetitively practice their writing at any time. AEE systems are reliable and consistent as they predict the same score for a single essay each time that essay is input to Advances in the Field of Automated Essay Evaluation Informatica 39 (2015) 383–395 387 recting these errors. In addition, the systems can also pro- vide global feedback on content and development. Auto- mated feedback reduces teachers’ load and helps students become more autonomous. In addition to numerical score such feedback provides a meaningful explanation by sug- gesting improvements. Systems with feedback can be an aid, not a replacement, for classroom instruction. Advan- tages of automated feedback are its anonymity, instanta- neousness, and encouragement for repetitive improvements by giving students more practice for writing essays [63]. The current limitation of the feedback is that its con- tent is limited to the completeness or correctness of the syntactic aspect of the essay. Some attempts have been made [6, 19] to include also semantic evaluation, but these approaches are not automatic and work only partially. 4 Automated essay evaluation systems This section provides an overview of the state-of-the-art AEE systems. First we briefly describe the field of NLP that has influenced the growing development of the AEE systems in the last 20 years the most. This is followed by the presentation of proprietary AEE systems developed by commercial organizations as well as two publicly-available systems and approaches proposed by the academic com- munity. We conclude this section with a comparison of described systems. 4.1 Natural language processing Natural language processing is a computer-based approach for analyzing language in text. In [34] it is defined as “a range of computational techniques for analyzing and rep- resenting naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human- like language processing for a range of task applications”. This complex definition can be fractionated for better un- derstanding: “The range of computational techniques” re- lates to the numerous approaches and methods used for each type of language analysis; and “Naturally occurring texts” describes the diversity of texts, i.e. different lan- guages, genres, etc. The primary requirement of all NLP approaches is that the text is in a human understandable language. Research in the field started in the 1940s [27]. As many other fields of computer science, the NLP field began grow- ing rapidly in the 1990 along with the increased avail- ability of electronic text, computers with high speed and high memory capabilities, and the Internet. New statisti- cal and rule-based methods allowed researchers to carry out various types of language analyse, including analyses of syntax (sentence structure), morphology (word struc- ture), and semantics (meaning) [11]. The state-of-the-art approaches include automated grammatical error detection in word processing software, Internet search engines, ma- chine translation, automated summarization, and sentiment analysis. As already mentioned, NLP methods played the crucial role in the development of AEE technologies, such as: part of speech tagging (POS), syntactic parsing, sentence frag- mentation, discourse segmentation, named entity recogni- tion, and content vector analysis (CVA). 4.2 AEE systems Until recently, one of the main obstacles to achieve progress in the field of AEE has been lack of open-source AEE systems, which would provide insight into their grad- ing methodology. Namely, most of the AEE research has been conducted by commercial organizations that have pro- tected their investments by restricting access to technologi- cal details. In the last couple of years there were several at- tempts to make the field more “exposed” including recently published Handbook of Automated Essay Evaluation [56]. In this section we describe the majority of systems and approaches. We overview the systems that have predomi- nance in this field - and are consequently more complex and have attracted greater publicity. All systems work by ex- tracting a set of attributes (system-specific) and using some machine learning algorithm to model and predict the final score. – Project Essay Grade (PEG) PEG is a proprietary AES system developed at Mea- surement Inc. [44]. It was first proposed in 1966, and in 1998 a web interface was added [58]. The sys- tem scores essays through measuring trins and proxes. A trin is defined as an intrinsic higher-level variable, such as punctuation, fluency, diction, grammar etc., which as such cannot be measured directly and has to be approximated by means of other measures, called proxes. For example, the trin punctuation is measured through the proxes number of punctuation errors and number of different punctuations used. The system uses regression analysis to score new essays based on a training set of 100 to 400 essays [45]. – e-rater E-rater is a proprietary automated essay evaluation and scoring system developed at the Educational Test- ing Service (ETS) in 1998 [10]. E-rater identifies and extracts several attribute classes using statistical and rule-based NLP methods. Each attribute class may represent an aggregate of multiple attributes. The at- tribute classes include the following [4, 9]: (1) gram- matical errors (e.g. subject-verb agreement errors), (2) word usage errors (e.g. their versus there), (3) errors in writing mechanics (e.g. spelling), (4) pres- ence of essay-based discourse elements (e.g. thesis statement, main points, supporting details, and con- clusions), (5) development of essay-based discourse elements, (6) style weaknesses (e.g. overly repetitious words), (7) two content vector analysis (CVA)-based 388 Informatica 39 (2015) 383–395 K. Zupanc et al. attributes to evaluate topical word usage, (8) an alter- native, differential word use content measure, based on the relative frequency of a word in high scoring versus low-scoring essays, (9) two attributes to as- sess the relative sophistication and register of essay words, and (10) an attribute that considers correct us- age of prepositions and collocations (e.g., powerful computer vs. strong computer), and variety in terms of sentence structure formation. The set of ten attribute classes represent positive attributes, rather than num- ber of errors. The system uses regression modeling to assign a final score to the essay [11]. E-rater also includes detection of essay similarity and advisories that point out if an essay is off topic, has problems with discourse structure, or includes large number of grammatical errors [23]. – Intelligent Essay Assessor (IEA) In 1998 the Pearson Knowledge Technologies (PKT) developed Intelligent Essay Assessor (IEA). The sys- tem is based on the Latent Semantic Analysis (LSA), a machine-learning method that acquires and represents knowledge about meaning of words and documents by analyzing large bodies of natural text [32]. IEA uses LSA to derive attributes describing content, or- ganization, and development-based attributes of writ- ing. Along with LSA, IEA also uses NLP-based mea- sures to extract attributes measuring lexical sophisti- cation, grammatical, mechanical, stylistic, and orga- nizational aspects of essays. The system uses approx- imately 60 attributes to measure above aspects within essays: content (e.g. LSA essay semantic similarity, vector length), lexical sophistication (e.g. word ma- turity, word variety, and confusable words), grammar (e.g. n-gram attributes, grammatical errors, and gram- mar error types), mechanics (e.g. spelling, capitaliza- tion, and punctuation), style, organization, and devel- opment (e.g. sentence-sentence coherence, overall es- say coherence, and topic development). IEA requires a training with a representative sample (between 200 and 500) of human-scored essays. – IntelliMetric IntelliMetric was designed and first released in 1999 by Vantage Learning as a proprietary system for scoring essay-type, constructed response questions [51]. The system analyzes more than 400 semantic- , syntactic-, and discourse-level attributes to form a composite sense of meaning. These attributes can be divided into two major categories: content (discourse/rhetorical and content/concept attributes) and structure (syntactic/structural and mechanics at- tributes). The content attributes evaluate the topic cov- ered, the breadth of content, and support for advanced concepts, cohesiveness and consistency in purpose and main idea, and logic of discourse. Whereas struc- ture attributes evaluate grammar, spelling, capitaliza- tion, sentence completeness, punctuation, syntactic variety, sentence complexity, usage, readability, and subject-verb agreement [51]. The system uses mul- tiple predictions (called judgements) based on mul- tiple mathematical models, including linear analysis, Bayesian, and LSA to predict the final score and com- bines the models into a single final essay score [49]. Training Intellimetric requires a sample of at least 300 human-scored essays. IntelliMetric uses Legitimatch technology to identify responses that appear off topic, are too short, do not conform to the expectations for edited American English, or are otherwise inappropri- ate [51]. – Bookette Bookette [48] was designed by California Testing Bu- reau (CTB) and became operational in classroom set- tings in 2005 and in large-scale testing settings in 2009. Bookette uses NLP to derive about 90 attributes describing student-produced text. Combinations of these attributes describe traits of effective writing: organization, development, sentence structure, word choice/grammar usage, and mechanics. The system uses neural networks to model expert human grader scores. Bookette can build prompt-specific models as well as generic models that can be very useful in class- rooms for formative purposes. Training Bookette re- quires a set (from 250 to 500) of human-scored essays. Bookette is used in CTB’s solution Writing Roadmap 2.0, in West Virginia’s summative writing assessment known as Online Writing Assessment (OWA) pro- gram and in their formative writing assessment West Virginia Writes. The system provides feedback on students writing performance that includes both holis- tic feedback and feedback at the trait level including comments on the grammar, spelling, and writing con- ventions at the sentence level [48]. – CRASE Pacific Metrics proprietary automated scoring engine, CRASE [35], moves through three phases of the scor- ing process: identifying inappropriate attempts, at- tribute extraction, and scoring. The attribute ex- traction step is organized around six traits of writ- ing: ideas, sentence fluency, organization, voice, word choice, conventions, and written presentation. The system analyzes a sample of already-scored student responses to produce a model of the graders’ scoring behaviour. CRASE is a Java-based application that runs as a web service. The system is customizable with respect to the configurations used to build ma- chine learning models as well as the blending of hu- man and machine scoring (i.e., deriving hybrid mod- els) [35]. Application also produces text-based and numeric-based feedback that can be used to improve the essays. – AutoScore AutoScore is a proprietary AEE system designed by Advances in the Field of Automated Essay Evaluation Informatica 39 (2015) 383–395 389 the American Institute for Research (AIR). The sys- tem analyzes measures based on concepts that dis- criminate between high- and low- scored papers, mea- sures that indicate the coherence of concepts within and across paragraphs, and a range of word-use and syntactic measures. Details about the system were never published, however, the system was evaluated in [57]. – Lexile Writing Analyzer The Lexile Writing Analyzer is a part of The Lex- ile Framework for Writing [59] developed by Meta- Metrics. The system is score-, genre-, prompt-, and punctuation-independent and utilizes the Lexile writer measure, which is an estimate of student’s ability to express language in writing, based on factors related to semantic complexity (the level of words used) and syntactic sophistication (how the words are written into sentences). The system uses a small number of at- tributes that represent approximations for writing abil- ity. Lexile perceives writing ability as an underlying individual trait. Training phase is not needed since a vertical scale is employed to measure student es- says [60]. – SAGrader SAGrader is an online proprietary AEE system devel- oped by IdeaWorks, Inc. [7]. The system was first known under the name Qualrus. SAGrader blends a number of linguistic, statistical, and artificial intel- ligence approaches to automatically score the essay. The operation of the SAGrader is as follows: The in- structor first specifies a task in a prompt. Then the instructor creates a rubric identifying the “desired fea- tures” – key elements of knowledge (set of facts) that should be included in a good response, along with re- lationships among those elements – using a seman- tic network (SN). Fuzzy logic (FL) permits the pro- gram to detect the features in the students’ essays and compare them to desired ones. Finally, an ex- pert system scores student essays based on the simi- larities between the desired and observed features [6]. Students receive immediate feedback indicating their scores along with the detailed comments indicating what they did well and what needs further work. – OBIE based AEE System The AEE system proposed by Gutierrez et al. [20, 21] provides both, scores and meaningful feedback, using ontology-based information extraction (OBIE). The system uses logic reasoning to detect errors in a statement from an essay. The system first transforms text into a set of logic clauses using open information extraction (OIE) methodology and incorporates them into domain ontology. The system determines if these statements contradict the ontology and consequently the domain knowledge. This method considers incorrectness as inconsistency with respect to the domain. Logic reasoning is based on the description logic (DL) and ontology debugging [19]. – Bayesian Essay Test Scoring sYstem (BETSY) The first scoring engine to be made available publicly was Rudner’s Bayesian Essay Test Scoring sYstem (BETSY) [50]. BETSY uses multinomial or Bernoulli Naïve Bayes models to classify texts into different classes (e.g. pass/fail, scores A-F) based on content and style attributes such as word unigrams and bi- grams, sentence length, number of verbs, noun–verb pairs etc. Classification is based on assumption that each attribute is independent of another. Conditional probabilities are updated after examining each attri- bute. BETSY worked well only as a demonstration tool for a Bayesian approach to scoring essays. It remained a preliminary investigation as the authors never continued with their work. – LightSIDE Mayfield and Rosé released LightSIDE [38], an easy- to-use automated evaluation engine. LightSIDE made very important contribution to the field of AEE by publicly providing compiled and source code. This program is designed as a tool for non-experts to quickly utilize text mining technology for a variety of purposes, including essay assessment. It allows choosing what set of attributes is best suited to rep- resent the text. LightSIDE offers a number of al- gorithms to perform learning mappings between at- tributes and the final score (e.g. linear regression, Naïve Bayes, linear support vector machines) [39]. – Semantic Automated Grader for Essays (SAGE) SAGE, proposed by Zupanc and Bosnić [67], eval- uates coherence of student essays. The system ex- tracts linguistic attributes using statistical and rule- based NLP methods, and content attributes. The nov- elty of the system is a set of semantic coherence at- tributes measuring changes between sequential essay parts from three different perspectives: semantic dis- tance (e.g. distance between consecutive parts of an essay, maximum distance between any two parts), central spatial tendency/dispersion, and spatial auto- correlation in semantic space. These attributes allow better evaluation of local and global essay coherence. Using the random forests and extremely randomized trees the system builds regression models and grades unseen essays. The system achieves better predic- tion accuracy than 9 state-of-the-art systems evaluated in [57]. – Use of Syntactic and Shallow Semantic Tree Ker- nels for AEE Chali and Hasan [13] exposed the major limitation of LSA - it only retains the frequency of words by dis- regarding the word sequence and the syntactic and se- mantic structure of texts. They proposed the use of 392 Informatica 39 (2015) 383–395 K. Zupanc et al. tion. Shermis and Hammer [57] reported that two human scores (as measured by quadratic weighted kappas) ranged in rates of agreement from 0.61 to 0.85 and machine scores ranged from 0.60 to 0.84 in their agreement with the human scores. Results of the study for specific system can be seen in Table 1. Two other systems [15, 67] also used the same data set and reported on the prediction accuracy. Unfor- tunately we were not able to test the rest of the systems on the same data set or use independent data set to compare all of the system, since majority of the systems are proprietary and not publicly available. 5 Conclusion Development of the automated essay evaluation is impor- tant since it enables teachers and educational institutions to save time and money. Moreover it allows students to prac- tice their writing skills and gives them an opportunity to become better writers. From the beginning of the develop- ment of the field the unstandardized evaluation process and lack of attributes for describing the writing construct have been emphasized as disadvantages. In the last years, the ad- vancement in the field became faster by the rising number of papers describing publicly available systems that achieve comparable results with other state-of-the-art systems [38]. In this survey we made an overview of the field of au- tomated essay evaluation. It seems that one of the current challenges concerns the meaningful feedback that instruc- tional applications offer to a student. AEE systems can recognize certain types of errors including syntactic errors, provide global feedback on content and development, and offer automated feedback on correcting these errors. Re- searchers are currently trying to provide a meaningful feed- back also about the completeness and correctness of the semantic of the essay. This is closely related to the eval- uation of semantic content of student essays, more specif- ically with the analysis of correctness of the statements in the essays. Another problem concerning the AEE commu- nity is the unification of evaluation methodology. The fact that more and more classical educational ap- proaches has been automatized using computers raises con- cerns. Massive open online courses (MOOC) have become part of the educational systems and are replacing the tradi- tional teacher - student relation and call into question the educational process in the classrooms. While computer grading of multiple choice tests has been used for years, computer scoring of more subjective material like essays is now moving into the academic world. Automated essay evaluation is playing one of the key roles in the current de- velopment of the automated educational systems, including MOOC. All these leaves many open questions regarding the replacement of human teachers with computer, which should be taken into consideration in the future and be an- swered with the further development of the field. As a summary of our review, we would like to encourage all the researchers from the field to publish their work as an open-source resources, thereby allow others to compare results. This would contribute to faster development of the field and would consequently lead to novel solutions to the above described challenges. References [1] H. B. Ajay, P. I. Tillet, and E. B. Page, “Analysis of essays by computer (AEC-II),” U.S. Department of Health, Education, and Welfare, Office of Education, National Center for Educational Research and Devel- opment, Washington, D.C., Tech. Rep., 1973. [2] Y. Attali, “A Differential Word Use Measure for Con- tent Analysis in Automated Essay Scoring,” ETS Re- search Report Series, vol. 36, 2011. [3] ——, “Validity and Reliability of Automated Essay Scoring,” in Handbook of Automated Essay Eval- uation: Current Applications and New Directions, M. D. Shermis and J. C. Burstein, Eds. New York: Routledge, 2013, ch. 11, pp. 181–198. [4] Y. Attali and J. Burstein, “Automated Essay Scoring With e-rater V . 2,” The Journal of Technology, Learn- ing and Assessment, vol. 4, no. 3, pp. 3–29, 2006. [5] L. Bin and Y. Jian-Min, “Automated Essay Scor- ing Using Multi-classifier Fusion,” Communications in Computer and Information Science, vol. 233, pp. 151–157, 2011. [6] E. Brent, C. Atkisson, and N. Green, “Time-shifted Collaboration: Creating Teachable Moments through Automated Grading,” in Monitoring and Assessment in Online Collaborative Environments: Emergent Computational Technologies for E-learning Support, A. Juan, T. Daradournis, and S. Caballe, Eds. IGI Global, 2010, pp. 55–73. [7] E. Brent and M. Townsend, “Automated essay grad- ing in the sociology classroom,” in Machine Scoring of Student Essays: Truth and Consequences?, P. Fre- itag Ericsson and R. H. Haswell, Eds. Utah State University Press, 2006, ch. 13, pp. 177–198. [8] B. Bridgeman, “Human Ratings and Automated Es- say Evaluation,” in Handbook of Automated Essay Evaluation: Current Applications and New Direc- tions, M. D. Shermis and J. C. Burstein, Eds. New York: Routledge, 2013, ch. 13, pp. 221–232. [9] J. Burstein, M. Chodorow, and C. Leacock, “Auto- mated Essay Evaluation: The Criterion Online Writ- ing Service,” AI Magazine, vol. 25, no. 3, pp. 27–36, 2004. Advances in the Field of Automated Essay Evaluation Informatica 39 (2015) 383–395 393 [10] J. Burstein, K. Kukich, S. Wolff, C. Lu, and M. Chodorow, “Computer Analysis of Essays,” in Proceedings of the NCME Symposium on Automated Scoring, no. April, Montreal, 1998, pp. 1–13. [11] J. Burstein, J. Tetreault, and N. Madnani, “The E- rater R© Automated Essay Scoring System,” in Hand- book of Automated Essay Evaluation: Current Ap- plications and New Directions, M. D. Shermis and J. Burstein, Eds. New York: Routledge, 2013, ch. 4, pp. 55–67. [12] D. Castro-Castro, R. Lannes-Losada, M. Maritxalar, I. Niebla, C. Pérez-Marqués, N. C. Álamo Suárez, and A. Pons-Porrata, “A multilingual application for au- tomated essay scoring,” in Advances in Artificial In- telligence – 11th Ibero-American Conference on AI. Lisbon, Portugal: Springer, 2008, pp. 243–251. [13] Y. Chali and S. A. Hasan, “On the Effectiveness of Using Syntactic and Shallow Semantic Tree Kernels for Automatic Assessment of Essays,” in Proceedings of the International Joint Conference on Natural Lan- guage Processing, no. October, Nagoya, Japan, 2013, pp. 767–773. [14] T. H. Chang, C. H. Lee, P. Y. Tsai, and H. P. Tam, “Automated essay scoring using set of literary se- memes,” in Proceedings of International Conference on Natural Language Processing and Knowledge En- gineering, NLP-KE 2008. Beijing, China: IEEE, 2008, pp. 1–5. [15] H. Chen, B. He, T. Luo, and B. Li, “A Ranked-Based Learning Approach to Automated Essay Scoring,” in Proceedings of the Second International Conference on Cloud and Green Computing. Ieee, Nov. 2012, pp. 448–455. [16] Y. Chen, C. Liu, C. Lee, and T. Chang, “An Unsu- pervised Automated Essay- Scoring System,” IEEE Intelligent systems, vol. 25, no. 5, pp. 61–67, 2010. [17] J. R. Christie, “Automated Essay Marking – for both Style and Content,” in Proceedings of the Third Annual Computer Assisted Assessment Conference, 1999. [18] A. Fazal, T. Dillon, and E. Chang, “Noise Reduc- tion in Essay Datasets for Automated Essay Grading,” Lecture Notes in Computer Science, vol. 7046, pp. 484–493, 2011. [19] F. Gutiererz, D. Dou, S. Fickas, and G. Griffiths, “On- line Reasoning for Ontology-Based Error Detection in Text,” On the Move to Meaningful Internet Sys- tems: OTM 2014 Conferences Lecture Notes in Com- puter Science, vol. 8841, pp. 562–579, 2014. [20] F. Gutierrez, D. Dou, S. Fickas, and G. Griffiths, “Providing grades and feedback for student sum- maries by ontology-based information extraction,” in Proceedings of the 21st ACM international confer- ence on Information and knowledge management - CIKM ’12, 2012, pp. 1722–1726. [21] F. Gutierrez, D. Dou, A. Martini, S. Fickas, and H. Zong, “Hybrid Ontology-based Information Ex- traction for Automated Text Grading,” in Proceedings of 12th International Conference on Machine Learn- ing and Applications, 2013, pp. 359–364. [22] A. Herrington, “Writing to a Machine is Not Writing At All,” in Writing assessment in the 21st century: Essays in honor of Edward M. White, N. Elliot and L. Perelman, Eds. New York: Hampton Press, 2012, pp. 219–232. [23] D. Higgins, J. Burstein, and Y. Attali, “Identifying off-topic student essays without topic-specific train- ing data,” Natural Language Engineering, vol. 12, no. 02, pp. 145–159, May 2006. [24] T. Ishioka and M. Kameda, “Automated Japanese es- say scoring system:jess,” Proceedings. 15th Interna- tional Workshop on Database and Expert Systems Ap- plications, 2004., pp. 4–8, 2004. [25] T. Ishioka, “Automated Japanese Essay Scoring Sys- tem based on Articles Written by Experts,” in Pro- ceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, no. July, Sydney, 2006, pp. 233–240. [26] M. M. Islam and A. S. M. L. Hoque, “Automated es- say scoring using Generalized Latent Semantic Anal- ysis,” Journal of Computers, vol. 7, no. 3, pp. 616– 626, 2012. [27] K. S. Jones, “Natural language processing: a histor- ical review,” Linguistica Computazionale, vol. 9, pp. 3–16, 1994. [28] T. Kakkonen, N. Myller, E. Sutinen, and J. Timonen, “Comparison of Dimension Reduction Methods for Automated Essay Grading,” Educational Technology & Society, vol. 11, no. 3, pp. 275–288, 2008. [29] T. Kakkonen, N. Myller, J. Timonen, and E. Suti- nen, “Automatic Essay Grading with Probabilistic La- tent Semantic Analysis,” in Proceedings of the second workshop on Building Educational Applications Us- ing NLP, no. June, 2005, pp. 29–36. [30] M. T. Kane, “Validation,” in Educational Measure- ment, 4th ed., R. L. Brennan, Ed. Westport, CT: Praeger Publishers, 2006, pp. 17–64. [31] T. K. Landauer, P. W. Foltz, and D. Laham, “An intro- duction to latent semantic analysis,” Discourse Pro- cesses, vol. 25, no. 2-3, pp. 259–284, Jan. 1998. 394 Informatica 39 (2015) 383–395 K. Zupanc et al. [32] T. K. Landauer, D. Laham, and P. W. Foltz, “The In- telligent Essay Assessor,” IEEE Intelligent systems, vol. 15, no. 5, pp. 27–31, 2000. [33] B. Lemaire and P. Dessus, “A System to Assess the Semantic Content of Student Essays,” Journal of Edu- cational Computing Research, vol. 24, no. 3, pp. 305– 320, 2001. [34] E. D. Liddy, “Natural Language Processing,” in Ency- clopedia of Library and Information Science, 2nd ed., M. Decker, Ed. Taylor & Francis, 2001. [35] S. M. Lottridge, E. M. Schulz, and H. C. Mitzel, “Using Automated Scoring to Monitor Reader Per- formance and Detect Reader Drift in Essay Scoring.” in Handbook of Automated Essay Evaluation: Cur- rent Applications and New Directions, M. D. Shermis and J. Burstein, Eds. New York: Routledge, 2013, ch. 14, pp. 233–250. [36] S. M. Lottridge, H. C. Mitzel, and F. Chou, “Blending machine scoring and hand scoring for constructed re- sponses,” in Paper presented at the CCSSO National Conference on Student Assessment, Los Angeles, Cal- ifornia, 2009. [37] O. Mason and I. Grove-Stephenson, “Automated free text marking with paperless school,” in Proceedings of the Sixth International Computer Assisted Assess- ment Conference, 2002, pp. 213–219. [38] E. Mayfield and C. Penstein-Rosé, “An Interactive Tool for Supporting Error Analysis for Text Mining,” in Proceedings of the NAACL HLT 2010 Demonstra- tion Session, Los Angeles, CA, 2010, pp. 25–28. [39] E. Mayfield and C. Rosé, “LightSIDE: Open Source Machine Learning for Text,” in Handbook of Auto- mated Essay Evaluation: Current Applications and New Directions, M. D. Shermis and J. Burstein, Eds. New York: Routledge, 2013, ch. 8, pp. 124–135. [40] D. McCurry, “Can machine scoring deal with broad and open writing tests as well as human readers?” As- sessing Writing, vol. 15, no. 2, pp. 118–129, 2010. [41] T. McGee, “Taking a Spin on the Intelligent Essay As- sessor,” in Machine Scoring of Student Essays: Truth and Consequences?2, P. Freitag Ericsson and R. H. Haswell, Eds. Logan, UT: Utah State University Press, 2006, ch. 5, pp. 79–92. [42] K. M. Nahar and I. M. Alsmadi, “The Automatic Grading for Online exams in Arabic with Essay Ques- tions Using Statistical and Computational Linguistics Techniques,” MASAUM Journal of Computing, vol. 1, no. 2, 2009. [43] R. Östling, A. Smolentzov, and E. Höglin, “Auto- mated Essay Scoring for Swedish,” in Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, vol. 780, Atlanta, Georgia, US., 2013, pp. 42–47. [44] E. B. Page, “The Imminence of... Grading Essays by Computer,” Phi Delta Kappan, vol. 47, no. 5, pp. 238–243, 1966. [45] ——, “Computer Grading of Student Prose , Using Modern Concepts and Software,” Journal of Experi- mental Education, vol. 62, no. 2, pp. 127–142, 1994. [46] D. E. Powers, J. C. Burstein, M. Chodorow, M. E. Fowles, and K. Kukich, “Stumping e- rater:challenging the validity of automated essay scoring,” Computers in Human Behavior, vol. 18, no. 2, pp. 103–134, Mar. 2002. [47] C. Ramineni and D. M. Williamson, “Automated es- say scoring: Psychometric guidelines and practices,” Assessing Writing, vol. 18, no. 1, pp. 25–39, 2013. [48] C. S. Rich, M. C. Schneider, and J. M. D’Brot, “Ap- plications of Automated Essay Evaluation in West Virginia,” in Handbook of Automated Essay Eval- uation: Current Applications and New Directions, M. D. Shermis and J. Burstein, Eds. New York: Routledge, 2013, ch. 7, pp. 99–123. [49] L. M. Rudner, V. Garcia, and C. Welch, “An Eval- uation of the IntelliMetric Essay Scoring System,” The Journal of Technology, Learning and Assessment, vol. 4, no. 4, pp. 3–20, 2006. [50] L. M. Rudner and T. Liang, “Automated Essay Scor- ing Using Bayes ’ Theorem,” The Journal of Technol- ogy, Learning and Assessment, vol. 1, no. 2, pp. 3–21, 2002. [51] M. T. Schultz, “The IntelliMetric Automated Essay Scoring Engine - A Review and an Application to Chinese Essay Scoring,” in Handbook of Automated Essay Evaluation: Current Applications and New Di- rections, M. D. Shermis and J. C. Burstein, Eds. New York: Routledge, 2013, ch. 6, pp. 89–98. [52] M. D. Shermis, “State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration,” Assessing Writ- ing, vol. 20, pp. 53–76, 2014. [53] M. D. Shermis and J. Burstein, “Introduction,” in Au- tomated essay scoring: A cross-disciplinary perspec- tive, M. D. Shermis and J. Burstein, Eds. Manwah, NJ: Lawrence Erlbaum Associates, 2003, pp. xiii– xvi. [54] M. D. Shermis, J. Burstein, and S. A. Bursky, “Introduction to Automated Essay Evaluation,” in Handbook of Automated Essay Evaluation: Current Applications and New Directions, M. D. Shermis,
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved