How to Get AI to 'Accurately' Grade Your Exams

#llms#ielts#question

Something I'm seeing a lot is people uploading their writing samples to AI tools like Chat GPT or claude, asking for an evaluation, then complaining that the grading is not consistent or fair. I won't get into the nature of LLMs in this post, but

a.) you need to prompt the tool correctly to get a more accurate result

b.) LLMs are not in a place right now where you should be fully dependent on them for your prep

Let's take this sample IELTS question/response:

Question: Some people believe that children should be taught to compete in school, while others think they should be taught to cooperate. Discuss both views and give your own opinion.

Response: Nowadays, there is a debate about whether children should learn to compete or cooperate in school. Both approaches have advantages and I will discuss them in this essay...[full question/answer I used for this post in comments].

Prompting the Tool Correctly

If you just copy paste this into ChatGPT and ask it for your IELTS band -- you are prompting the tool incorrectly! You will get high variance, and likely inflated, scoring. Your prompt should look something like this:

"https://takeielts.britishcouncil.org/sites/default/files/ielts\_writing\_band\_descriptors.pdf

You are an expert IELTS examiner. Read this attached rubric and then accurately and fairly grade the following ielts writing task:

[insert your question/answer pair here]"

You will see lower variance (you can test this by opening multiple chats in incognito mode across multiple LLMs and giving them the same exact prompt) and lower band inflation.

Why? By prompting with the rubric you are forcing the "virtual examiner" to match specific words and phrases in your response with the guidelines in the rubric. This will give you a more "accurate" result.

LLMs are Not Examiners

I say "accurate" always in quotes because of how LLMs work. These are essentially prediction algorithms based on what they've seen before. Real human examiners have sat through oral, written, and virtual trainings where context has been added to the rubric by a professional trainer of trainers. They can more easily and consistently grade novel question/answer pairs.

While there is also variance amongst examiners, decades of training program development means getting your results reviewed by an expert examiner is likely going to be more closely aligned with your test day results.

I'm a big fan of not spending a single $ on your test prep up until you reach a certain point. PDFs, free online tests, chatting with a buddy, AI tools, etc. should be exhausted first. However, once you've maxed out your free resources, it's time to close the loop with an expert examiner. You can find one on lottalingo.com.

Good luck studying!