AI-Powered Essay Scoring Solutions Still Have Room For Improvements

AI-Powered Essay Scoring Solutions Still Have Room For Improvements
TZIDO SUN/Shutterstock.com
Summary: Written responses are the Achilles heel of eLearning testing. How close is AI to solving the problem?

AI-Powered Essay Scoring Solutions Are Still Improving

In all corners of the education world, there's an ever-growing need to streamline and optimize all of the work that goes into teaching students what they need to know. For decades, technology of all types has played a major role in those efforts. In fact, one of the most well-known and widely used examples are the ubiquitous scantron machines that teachers have relied on for multiple choice testing for the last forty years.

In the eLearning industry, efficiency is the whole point, as digital learning tools can help a single educator teach and manage a much larger group of students at once, and many of the day-to-day tasks that occupy a teacher's time have been automated to accommodate that scale. Modern LMS platforms can take care of things like course recommendations, invitations to scheduled learning events, and even course completion and certification notifications. One place that they've historically fallen short, however, is in automated grading solutions, particularly in free response formats.

Recently, the growth of Artificial Intelligence (AI) and Natural Language Processing (NLP) technologies have brought the possibility of grading free responses like short answer and essay questions closer than they've ever been before. Here's a look at where the technology stands in what is one of the last great frontiers of eLearning automation, and when it might start to see mass adoption.

Rudimentary Essay Scoring

For a number of years, some LMS solutions have included some basic written answer scoring tools, aimed at providing a guide to the human administrator that would then issue a final score. For the most part, the current generation of such tools relies on a rubric of expected keywords and derivative terms. In short, the course designer must provide the scoring system with a weighted list of terms that should appear in a correct answer to a given prompt, and assign a point value for each of the terms. It's a system that can handle basic, short written responses that require factual answers, but little else. In practice, that makes such systems of limited value to eLearning platforms, since the simplicity of the required answer format means that most short answer prompts could be replaced with multiple choice questions without sacrificing much.

AI And NLP Enhance Functionality

To be of any real use to an eLearning platform, any automated essay scoring system must be able to do more than look for keywords. It must also be capable of understanding both the grammatical structure, intent, and tone of a written response. Over the last few years, quite a bit of progress toward that requirement has occurred in the world of AI. Most of the current development in the area is focused on the use of an NLP technique called latent semantic analysis, which is a machine learning technique that allows an AI to assess written responses by comparing them to a large number of known, human scored responses. In essence, the technology assigns grades to written answers based upon how structurally and contextually similar they are to previously scored text. In that way, the latest systems, such as the one promulgated by EdX in 2013 can sort answers into predetermined scoring buckets that match the teacher's prior grading standards.

Where Technology Falls Short

Although the latest in AI essay scoring systems are far more advanced and useful than previous keyword-based approaches, they still have a long way to go. That's because they're still utilizing a system that isn't capable of comprehending written answers but instead seeks to compare it to known samples. The problem with that approach is that it presupposes that the examples the AI has to use as guidelines are the only correct or incorrect responses. That doesn't leave much room for the varied stylistic and creative writing styles students may use and ignores any objective judgment of their merit. In addition, the methodology has also proven vulnerable to manipulation by those that can spot the logical flaws in the system.

The Bottom Line

As of now, the reality for eLearning professionals is that even the latest in automated written response scoring systems aren't ready for prime time. While they are improving, they're still only suitable for narrow use cases. For example, a history course might be a perfect fit to use the current technology, but a course teaching the finer points in how to write an essay conclusion would give even the most sophisticated AI system fits. For now, eLearning operators will have to continue scoring written responses the old-fashioned way or avoid them altogether. At the current pace of development, however, that reality may change in the very near future, so it's a topic that will be worth revisiting in the coming years.