Image Source : https://www.pexels.com/@rdne/

Ranking and testing LLMs and other AIs

Like taking them to school !

7 min read3 days ago

You might have seen AI Leaderboards here , here , here and there, and well, like most busy people probably nodded, made a mental note of the top AI dogs and went on with your day, this article is for those that want to go a little deeper into the details of how AIs are ranked and tested, either because you needs to know 🤌 as a decision maker or are just curious about the subject.

So how do you test an AI ?

Turns out exactly like you would a human being ( with some caveats ), let’s say your AI is tasked with identifying shapes, you then make a test with shapes and labels and let your AI try to identify as many as it can, you can even make a benchmark where x amount of shapes need to be correctly identified to be considered a “Shapeologist AI”:

This AI has to go back to school and train harder on pointy shapes !

The caveat here, which can be a big one is that well you are testing a 
machine (at least until my side project of sentient AIs bears fruit!), 
which means you are testing a system that itself is the result of testing 
,for instance a shape detecting AI is built by providing pairs of shapes 
and labels as a dataset to train on, and…

Ranking and testing LLMs and other AIs

Like taking them to school !

So how do you test an AI ?

Written by Keno Leon