Friday, January 20, 2017

The Community's Predictions (So Far) for Pro Tour AER

A few days ago I challenged the community to prove how good they were at identifying broken Magic cards by predicting the metagame of PT Aether Revolt, with a free duel deck up as a prize. After nearly 300 entries, let’s take a look at the results so far.

If you would like to play the prediction game before reading this summary of responses so far, go here. Entries will be open for about another day until StarCity begins broadcasting their Standard Open.


Top 8 Predictions


Let’s start by looking at the tiebreaker questions. These are questions I added to the survey to break any ties that could occur if there was a tie in the scoring on the card-by-card predictions. Unlike the main ballot, which asked about the best constructed decks (7+ wins), the tiebreaker section sought predictions about the Top 8 decks only. Also, these questions were optional, so some respondents left these questions blank.




The community predicts a mean of 4.48 unique archetypes in the top 8, with a mode of 5. This would make for a reasonably diverse top 8, but I think the community actually isn’t optimistic enough here. In the past, we’ve seen that even PTs which inaugurated relatively stale Standard seasons (such as PT SOI) had diverse top 8s, due to the early state of the metagame, the inclusion of limited rounds confounding the constructed results, and the sheer skill of certain top players. (Seriously, how many people built that middling Seasons Past deck because Jon Finkel is a beast?) My personal over/under on number of Top 8 archetypes would have been set at 5.5.


Here things get pretty interesting. White and Blue together combine for nearly 2/3rds of the responses here. This makes sense as those colors represent a number of archetypes (Various combos, Tempo/Flash, Control) that can be expected to put players into the Top 8.

I still remember how last year, everyone was all talking about how Magic was all about creatures now and Green with all the high value creatures was just too good and Blue is always going to suck. Or how the year before that people were wondering if maybe RDW was always just going to be the right deck to bring to Pro Tours because it seemed to always feast on the inchoate PT metagames. The lesson here is to remember that WoTC likes to constantly mix things up in Standard and that the pendulum swings around by design.


Mean here is 2.67, which seems reasonable to me just based on my gut impression of past Pro Tours. More and more, we are seeing “stacked PT T8s” reminding us that Magic is at the end of the day a game of skill. Yes, you always need a bit of luck to Top 8, but over and over again we’ve seen how the consistently good players are more likely to put themselves in a position to benefit from that luck.

I made a small mistake on this question and failed to include an option for 0 first time top 8s, which likely skews the results a little. If this question is needed to break a tie, I will be giving the win to the respondent with the answer closest to the true number.

Ok, now it’s time to look at some cards. On the survey I segregated the colorless cards, the monocolored cards, and the gold cards. This was so that cards that could go into any colored deck wouldn't be compared against cards that could only go into decks of specific colors.


Colorless Cards



Walking Ballista is the favorite here, but Heart of Kiran and Aethersphere Harvester aren’t far behind.
Metallic Mimic and Inspiring Statuary come in as dark horse picks among the colorless cards.
These picks are all fairly conventional. In fact, as of this writing, the ranking of the top 4 colorless rares in the survey perfectly correspond with their ranking by market price! (As a Mythic, Heart of Kiran’s greater scarcity of course throws off this pattern.)


Gold Cards


Winding constrictor’s combo potential made it by far the community pick. Oath of Ajani is a distant second.


Monocolored Cards


And finally, we have the biggest chunk of the survey, the monocolored cards. Each submission picked 5 cards, distributed as the respondent desired among the different colors (ie, it’s a valid strategy to stack up on a single color’s cards if you anticipate that color to truly be that strong in PT AER.) There are too many cards here to list, so the table below is all the cards with >20 selections.
The Copycat hype is real. Felidar Guardian was a clear top pick here and frankly, the only top pick that is potentially truly “broken.”
The next top 3 selections were strong answers that have a very good chance of being staples in the format, but aren’t going to be format-destroying cards anytime soon. Sure, we know that Shock is a good card that will probably see quite a bit of play. But we also know it’s probably not a development mistake.
Rounding out the top 5 is another card with some brokenness potential. As both a Cryptolith Rites and a Travel Preparations on a stick, the community thinks Rishkar is a card with some upside.
There’s a natural tension to prediction games like this - to some extent, you can use the conventional wisdom as a guide to your picks. But if you hew too closely to conventional wisdom, then your submission won’t stick out from the crowd in any way and your chance of winning is lower. There is a long tail of dark horse picks among the submissions so far: many cards with 5-20 picks that I haven’t included in the chart above.


What it All Means


As a reminder, one of the objectives of this whole game is to resolve the question of whether or not the community is actually able to identify development mistakes better than the people who design the game, or if complaints about RnD’s incompetence are just hindsight bias. Here’s what I had to say in my post announcing the contest:

If I’m wrong, and there is an obvious development mistake, the community’s picks should concentrate into a few (<3) cards, and those cards should turn out to be OP. If the community’s picks are spread out among a lot of cards, and some of them do turn out to be OP - then I’m right, and there were no obvious development mistakes. If the community’s picks are concentrated into a few cards, and those cards do not turn out to be OP, then I’m still right, and there were still no obvious development mistakes, because the ones we thought were “obviously too good” turned out not to be.

I’m pretty sure I’m right, but hey—maybe you guys will prove me wrong.

Looking at the results so far, I would say the only cards the community has pegged as potential development mistakes are Felidar Guardian and Winding Constrictor. If CopyCat turns out to be a truly degenerate combo, or Winding Constrictor turns out to be oppressive, I think it would be fair to conclude the RnD missed some pretty “obvious” mistakes in this set.

It’s not too late to prove me wrong and win a Japanese Elspeth vs Kiora Duel Deck! The FREE prediction game is open until Starcity begins streaming its first Standard content, which will likely occur sometime on Saturday morning!

Monday, January 16, 2017

There are (Almost) No Obvious Development Mistakes and Complaints are (Almost) Always Hindsight Bias: Issuing the PT AER Metagame Prediction Challenge

(Click here if you just want to go straight where you can win a Japanese Elspeth vs Kiora for predicting the PT AER metagame. Keep reading if you want to know why I think you - the collective you - aren't any good at identifying broken MtG cards.)

Complaints about WoTC RD’s card balancing/metagame prediction abilities basically all fall into something like this pattern:

Step 1) New cards are spoiled. Something like 10-20 strong standout cards with high competitive potential are identified by the community as cards to watch.

Step 2) Competitive metagame forms over the course of many weeks of high-level competitive play and deck refinement from a playerbase of millions. Eventually a handful of the true standout cards are discovered from among the group identified in Step 1. In some cases these cards might even be broken.

Step 3) “RD is terrible at their jobs, how did they miss this obviously broken card, I saw that this card was broken the second it was spoiled. Clearly the FFL is no better than a bunch of drunk monkeys.”

The key bias that allows people to falsely believe step 3 is that they did identify the development mistakes when they were spoiled. It’s just that they identified them as one of a bunch of possibly good cards. Some of these good cards didn’t quite make it, some of them were actually good, and some of them were broken. Then, we forget our misses and zero in with hindsight bias on the hits, and wonder why RD isn’t as good as we are at identifying the broken cards.

I’m not exaggerating about the level of contempt that is sometimes expressed for RD, by the way:
I hope we all can see the hindsight bias at play here. Particularly telling is how this poster’s ability to identify development “mistakes” seems to take a nosedive as we approach recent sets. Grim Flayer’s deck just got knocked out of the meta. And Mardu Vehicles is certainly a top contender, but Depala is hardly a card people point to as an OP development mistake. It’s not impossible that Yehenni’s Expertise will turn out to be a development mistake, but at this point it’s just one of many cards that could possibly be OP. Even Sylvan Advocate no longer seems like such a big mistake now that it’s completely fallen out of the metagame.

It’s instructive to go back and read some of the set reviews of sets that contained obvious development mistakes. Yes, we thought Dromoka’s Command and Collected Company might be good. We also thought Sidisi, Undead Vizer and Narset and Thunderbreak Regent and Secure the Wastes might be good. Some of those cards were good, some weren’t, and some were perhaps too good.

It’s Impossible to Evaluate Individual Cards without knowing the Metagame Context (And the Metagame is Impossible for the FFL to Accurately Predict)

As the fate of “obviously broken” cards fading in and out of the meta shows us, the difference between OP, good, fringe, and not-quite-good-enough cards isn’t the cards themselves. Their actual strength is always an emergent property of the metagame in which they exist. Take one of development’s biggest mistakes of recent memory:
There’s no denying that Collected Company was a pretty big miss on RD’s part, and the eventual Dragons/Origins-BFZ-Shadows standard it dominated was truly quite stale. But even this card’s strength in standard was highly context-dependent! On release CoCo was seen mainly as a great addition to modern, spawning a new but not broken archetype. Meanwhile in standard it was middling, forming part of a good-not-great Green/White aggro deck. It wasn’t until two sets later that enough solid 3-drops were released to create an environment for CoCo to become the oppressively format-warping card that we grew to hate.

CoCo started out fine and became strong, so in hindsight we consider it broken. It’s also instructive to consider a card that had something of the reverse dynamic:
Half a year ago Duskwatch Recruiter was labelled a “development mistake” just as often as Collected Company was. Green getting a recurring card draw effect that’s also an extremely efficient beater that’s also a ramp spell? How ridiculous is that? Of course this is broken, what are the morons in RD doing? But wait, the meta changed, and these days even when there is a green deck in the format, Recruiter doesn’t make the cut.

There’s been a lot of whining about FNM promos lately, so let’s look at one of the FNM whiffs of last year:
Flaying Tendrils has always been a bulk card. Which intern in RD do they have picking these Promos anyway?

But wait, what happened the last time they printed a similar effect?
You may not remember, but while in standard this was a $2+ uncommon. Sounds like something that would have been a solid FNM card!

Of course the difference is that Flaying Tendrils is in a standard environment where a mass -2/-2 is pretty useless, and Drown in Sorrow was in a standard environment where a mass -2/-2 was amazing.

So: individual cards are impossible to evaluate absent foreknowledge about how the entire metagame will shape out, and the metagame is the emergent result of the crowdsourced efforts of a horde of highly motivated and intelligent players (and even then it takes us a few months to really shake it out). Given this, I feel confident asserting that there are almost no obvious development mistakes, and if you think you identified some, you’re likely operating under hindsight bias.


You Totally Predicted the CoCo was OP, Though, and can Prove it


Any such assertions made after-the-fact (and any arguments that rely on such post-hoc assertions) are indistinguishable from hindsight bias and should be discounted. The only way to rigorously test whether you are as good at identifying OP cards as you say you are is to pre-commit, before the tournament results come in. Since we’ve just finished the Aether Revolt prerelease, that means… now. In the vein of the PT EMN Fantasy Draft, I am happy to unveil...


The Pro Tour Aether Revolt Metagame Prediction Challenge - Win a Japanese Elspeth vs Kiora!

Contest is here.

All entries will be individually scored. You receive 1 point every time one of your cards appears in a top-performing standard deck (7 wins/21 points or better). For each sideboard-only appearance, you will receive 0.25 points. To capture the impact of cards that may be format-defining despite not being 4-offs in their decks (such as Emrakul), you will receive the full point value even when your selected card is a 1-off, 2-off-, 3-off in its decks. The top individual entry will win a new Japanese Elspeth vs Kiora.

The challenge here is to prove me wrong. If I’m wrong, and there is an obvious development mistake, the community’s picks should concentrate into a few (<3) cards, and those cards should turn out to be OP. If the community’s picks are spread out among a lot of cards, and some of them do turn out to be OP - then I’m right, and there were no obvious development mistakes. If the community’s picks are concentrated into a few cards, and those cards do not turn out to be OP, then I’m still right, and there were still no obvious development mistakes, because the ones we thought were “obviously too good” turned out not to be.
I’m pretty sure I’m right, but hey—maybe you guys will prove me wrong.


Appendix - In Which I Concede a Situation Where I Look Pretty Wrong, but Not Really because Reasons


Funny thing about the terms of the metagame prediction challenge - had I run this challenge for Pro Tour Kaladesh, I would probably have lost. Why? Well, you may recall that this card was recently banned:
And honestly, had I run the prediction challenge pre PT-KLD, I expect a lot of people would have picked Copter. Now, I could submit the small quibble that the Smuggler’s Copter is colorless. For a prediction game that’s scored simply on the number of decks in which your picked cards appear, the strategic choice is to load up on powerful colorless cards that can go in many archetypes, rather than powerful archetype-specific cards. Even if you thought, say, Chandra, Torch of Defiance would end up being stronger than Smuggler’s Copter, your incentive would still to pick Copter. My point is not that Chandra is stronger than Copter, as we’ve learned that she’s not, just that someone *believing* at that time that Chandra is stronger would still have picked Copter, thus misrepresenting the wisdom of the playerbase versus development. It’s for this reason that I’ve segregated the colorless cards from the pick pool in the PT AER prediction challenge.

But that’s just a quibble, and  to be completely honest - a lot of people pegged Copter as the defining card of the set shortly after release, head and shoulders above the rest of the set. I don’t believe many people predicted a banworthy-level of brokenness from Copter, but it was definitely a case where FFL missed something that was reasonably obvious.

That said, a single exception does not disprove a general rule, and there is an “almost” in my assertion for a reason: “complaints are (almost) always hindsight bias.” So even granting that complaints about FFL missing on Smugglers Copter are more reasonable than most MtG balance whining, I still believe overall that obvious development mistakes are extremely rare.