If you’re interested in science/maths/that sort of thing, and you don’t already know about brilliant.org, this needs to be fixed!
It aims to nurture thinkers all over the world by gamifying maths and physics problems.
I love puzzles, and I actually really miss solving these kind of bite-sized problems that seemed incessant at school but don’t seem to feature so much in the real world. Brilliant.org fills a puzzle-shaped hole in my heart that’s been empty since I got bored of soduku.
I recently completed an online Game Theory course (which I highly recommend) in another online education favourite of mine, Coursera. So when I found out brilliant.org were running a game-theory-algorithm-writing programming competition, I got a little excited.
Each entrant had to write a python script that makes decisions on how to interact with every other entrant, based on a given scenario and some key data. Brilliant.org then played everyone’s algorithms against each other to see who would survive!
The details of the scenario and game rules can be found here, but here’s a summary.
You’re a member of a tribe. Everyone begins with a certain amount of food. Each round, you go through every other member of the tribe and choose to either hunt or slack with them (I shall call this an ‘interaction’). How much food you receive from each interaction depends on both what you decide to do and what your interaction partner decides to do. You each decide independently — you don’t discuss it with your partner.
Hunting and slacking use up different amounts of food units, and also return different amounts. Read the rules for full details, but here’s the outcome of what you win/lose depending on what you and your partner do:
Both slack: Both lose 2 food units
Both hunt: Neither gain nor lose
One slacks, one hunt: Slacker wins 1 food unit, hunter loses 3 food units
The amount of food you receive from the round is the sum of your winnings from each interaction. If your total food falls to 0 or below at the end of a round, you starve and die and are no longer in the game.
To complicate matters, there is the chance for everyone to benefit from a‘public good’ incentive. A random threshold is chosen, called ‘m‘, and if the sum of the number of times players choose to hunt reaches this threshold, EVERYONE (even slackers) get a food bonus.
How the game ends
Each round, there is a small random probability the game will end. The game would also end after all but one player has starved, if that happens sooner.
Everyone in the tribe has a reputation, which indicates how likely they are to hunt (based on their past decisions) between 0 and 1.
Each round, your script is provided with a list of the reputations of everyone left in the tribe. The order of this list changes each round, so you can’t simply track what a player did last time. You’re also provided with the value of_m_, and some key data like how much food you have etc. Based on this information, your script must return a list of ‘hunt’ or ‘slack’ decisions for each player.
So how do you decide what to do?
Let’s look at the beloved game theory ‘payoff table’ for this game:
You may spot that regardless of what your partner does, you are better off slacking. If they hunt, you can take advantage of this and gain a unit by slacking. If they slack, losing 2 units is better than losing 3, so again you are better off slacking. This makes slacking the best response in all cases and therefore a dominant strategy.
Unfortunately for you, your partner can probably recognise this as well, so you’ll both end up slacking — resulting in a Nash equilibrium. Annoyingly, this means you’ll both be worse off than if you both hunted!
You may recognise this as the famous ‘prisoner’s dilemma’. Indeed our tribal scenario is simply a variation on this classic.
So how do we encourage people to put aside rational self-interest for the good of all? This is pretty much what society is built on — people adhere to rules that may inconvenience them personally, but overall this makes a better society from which everyone benefits.
This works because we don’t expect the world to end any time soon, so we all want to make it a nice place to live. Repeated games encourage cooperation, because we know we will reap the benefits of everyone working together. They also allow players to build up a reputation, so we know who to trust.
In our tribal game, we know there is only a tiny chance of the game ending each round, so we should view it as a repeated game. It is therefore worth abandoning the dominant strategy (to an extent) and developing a more cooperative strategy, so that we might reap the wards of mass cooperation.
With this in mind, I worked off a few basic premises:
- No point hunting with those who have a very low reputation, as they will likely slack and you will be left out of pocket!
- No point hunting with those who have a very high reputation, as they are so likely to hunt that it is worth taking advantage of them.
- Need to introduce some randomness into my choices to throw those trying to track players off the scent.
- Need to generally reward players who hunt and punish those who slack
- Need to maintain a relatively good (but not too good!) reputation.
- More likely to hunt when ’m’ is low, as there is more chance of achieving it and gaining the public good reward.
To address 1 and 2, I decided immediately that I would never hunt with those with reputations of less than 0.2 and more than 0.9.
Keep ’em guessing
To address 3, I decided to base my decisions on a probability distribution. A Gaussian would give me a low chance of hunting with players who have extreme reputations, and the highest chance of hunting with players who have reputations of around 0.5. The image below shows a Gaussian bell curve shifted to be between 0 and 1 and centre on 0.5 (and the equation that generated it).
It is possible that some players would be using machine learning techniques to try and identify particular players and their strategies — introducing an element of randomness was an attempt to fly under the radar.
Reward the hunters
To address 4, I weighted the Gaussian using the players reputation, so that the probabilities were skewed in favour of those with a higher reputation. The image below shows the new weighted Gaussian and the equation that generates it.
Keeping up appearances
To address 5, I kept tabs on my own reputation and if it fell below 0.5, I aggressively ramped up my probabilities by a factor inversely proportional to my falling reputation. This ensured my reputation would never fall too low (and risk me getting starved out by others punishing slackers). So the probability function varies between the blue line and the green line in the image below depending on my reputation — only reaching the green line (and therefore maximal hunting chances) if my reputation falls right down to 0.
To address 6, each round I keep track of the proportion of hunts that have taken place throughout the game relative to the total number of interactions. If m is less than the average number of hunts per interactions, we have a good chance of reaching it and I decide it’s worth going for! I then increase my hunt probability by a small factor.
440 teams entered, 45 survived, and ranked according to their final food amounts, I came 9th! Very pleased with this, especially as I hadn’t had time to make the algorithm as advanced as I wanted (would have liked to add in some machine learning!)
We were provided with some stats about the game, and it was interesting to see that most people who survived finished with reputations between 0.5 and 0.6:
This seemed to be the sweet spot of maximising slacking but still maintaining a reputation good enough that people aren’t put off hunting with you. Luckily that’s pretty much what my algorithm did, by maintaining a reputation of just over 0.5.
Sadly I missed out on the top 5 who each won £1000… but I do win a t-shirt saying ‘I survived’!
Find out more
The full tribal rules can be found here: https://brilliant.org/competitions/hunger-games/rules/
Also check out some of the winners: https://brilliant.org/competitions/hunger-games/results/