(These thoughts were inspired by Paul Graham’s post The Lesson to Unlearn)
Here’s an example of how I used to game multiple choice questions:
When did de Klerk become president of South Africa?
- A. 8/89
- B. 9/88
- C. 9/89
- D. 8/98
Without even reading the question, you can guess the answer from the configuration of options (try before reading on!)
You can immediately eliminate D because its year is the odd one out - all the others are in the 80s. Of the three that are left, you can see that B and C share a month and A and C share a year. A question designer likely starts with the correct answer and makes deviations in different directions, so the correct answer will be the one that has elements in common with the others. So C must be the correct answer!
Suddenly, multiple choice questions are more like IQ tests testing pattern recognition than testing the actual content.
Examples like this are why I’m worried about AI alignment and misspecified objectives: it’s so easy to fall prey to Goodhart’s Law and optimize for the target when it’s not the thing you actually care about (even as humans who should know better!)