Elote: a python package of rating systems

Recently I’ve been interesting in rating systems.  Around here the application most front of mind for those is college football rankings. In general, imagine any case you have a large population of things you want to rank, and only a limited set of head-to-head matchups between those things to use for building your ratings. Without a direct comparison point between each possible pair, you’ve got to try to be clever.

Another classical example of a rating system is in chess rankings, and its in that domain that Arpad Elo developed his rating system, the Elo Rating system.  Since that Elo rating has been used in many other domains, and both before and after Elo there have been tons of other rating systems out there.  So with that in mind, I wanted to build a nice python package with a good digestable API that implements lots of these systems. I find it much easier to grok things like this when they can be run respectively on the same dataset anyway.

So here’s elote: a python package for rating systems. So far I’ve only implemented Elo and the first version of the Glicko rating system, but I think I’ve got the structure in a way that makes sense.  Elote is broken down into a few main concepts:

  • Competitors: a competitor is a “thing” which you would like to rank. If you’re ranking college football teams, you’d have one competitor per team. The different rating systems are implemented at the competitor level, but as you’ll see soon, that’s largely abstracted away. In concept, all a competitor is is a hashable python object that you can use to identify the “thing”, usually just a string label.
  • Bouts: a bout is a head to head matchup between two competitors, generally defined by some lambda function that takes in two competitors and returns True if the first wins, False if the second wins or None if it’s a draw.
  • Arenas : an arena is the part that ties it all together, a central object that creates the competitors, takes in the lambda function and a list of bouts, then evaluates everything. State can be saved from arenas, so you can do 10 bouts, save, then do 10 more later if you want.

Now, a simple example:

from elote import LambdaArena
import json
import random


# sample bout function which just compares the two inputs
def func(a, b):
    return a > b

matchups = [(random.randint(1, 10), random.randint(1, 10)) for _ in range(1000)]

arena = LambdaArena(func)
arena.tournament(matchups)

print(json.dumps(arena.leaderboard(), indent=4))

So here we are using the lambda arena to evaluate a bunch of bouts where the competitor labels are just random integers, and the lambda is just evaluating greater than. So the competitors are numbers, and larger numbers win, we’d expect to end up with a ranking that looks a lot like a sorted list from 1 to 10.  Notice that we don’t do anything with competitors here, the arena creates them for us and manages them fully, here using the default EloCompetitor for Elo Rating.

Finally, we pass the bouts to the arena using arena.tournament() and dump the leaderboard, yielding something like this:

[
    {
        "rating": 560.0,
        "competitor": 1
    },
    {
        "rating": 803.3256886926524,
        "competitor": 2
    },
    {
        "rating": 994.1660057704563,
        "competitor": 3
    },
    {
        "rating": 1096.0912814220258,
        "competitor": 4
    },
    {
        "rating": 1221.000354671287,
        "competitor": 5
    },
    {
        "rating": 1351.4243548137367,
        "competitor": 6
    },
    {
        "rating": 1401.770230395329,
        "competitor": 7
    },
    {
        "rating": 1558.934907485894,
        "competitor": 8
    },
    {
        "rating": 1607.6971796462033,
        "competitor": 9
    },
    {
        "rating": 1708.3786662956998,
        "competitor": 10
    }
]

And there we have it, a very slow sort function!

There’s a bunch more that we can do though in elote. For the full list, check out the examples here in the repo. But while we are here, we can also skip the arena, and interact with competitors directly, even using them to predict liklihood of future matchups:

from elote import EloCompetitor

good = EloCompetitor(initial_rating=400)
better = EloCompetitor(initial_rating=500)

print('probability of better beating good: %5.2f%%' % (better.expected_score(good) * 100, ))
print('probability of good beating better: %5.2f%%' % (good.expected_score(better) * 100, ))

good.beat(better)

print('probability of better beating good: %5.2f%%' % (better.expected_score(good) * 100, ))
print('probability of good beating better: %5.2f%%' % (good.expected_score(better) * 100, ))

We can save the state from an arena and re-instantiate a new one pretty simply (here using GlickoCompetitor instead of Elo):

saved_state = arena.export_state()
arena = LambdaArena(func, base_competitor=GlickoCompetitor, initial_state=saved_state)

And we can change the variables used in specific rating systems across the entire set of competitors in an arena:

arena = LambdaArena(func)
arena.set_competitor_class_var('_k_factor', 50)

And that’s about it for now. Still very early days, so if you’re interested in rating systems and want to help out, find me on github or here and let’s make some stuff.