Facebook and CMU's 'superhuman' poker AI beats human pros .

theverge.com

5 Views

        

Filed under:

Facebook and CMU’s ‘superhuman’ poker AI beats human pros

New, 2 comments

‘It can bluff better than any human.’

FRANCE-GAME-POKER Photo credit should read LIONEL BONAVENTURE/AFP/Getty Images

AI has definitively beaten humans at another of our favorite games. A poker bot, designed by researchers from Facebook’s AI lab and Carnegie Mellon University, has bested some of the world’s top players in a series of games of six-person no-limit Texas Hold ‘em poker.

Over 12 days and 10,000 hands, the AI system named Pluribus faced off against 12 pros in two different settings. In one, the AI played alongside five human players; in the other, five versions of the AI played with one human player (the computer programs were unable to collaborate in this scenario). Pluribus won an average of $5 per hand with hourly winnings of around $1,000 — a “decisive margin of victory,” according to the researchers.

“It’s safe to say we’re at a superhuman level and that’s not going to change,” Noam Brown, a research scientist at Facebook AI Research and co-creator of Pluribus, told The Verge.

“Pluribus is a very hard opponent to play against. It’s really hard to pin him down on any kind of hand,” Chris Ferguson, a six-time World Series of Poker champion and one of the 12 pros drafted against the AI, said in a press statement.

In a paper published in Science, the scientists behind Pluribus say the victory is a significant milestone in AI research. Although machine learning has already reached superhuman levels in board games like chess and Go, and computer games like Starcraft II and Dota, six-person no-limit Texas Hold ‘em represents, by some measures, a higher benchmark of difficulty.

Not only is the information needed to win hidden from players (making it what’s known as an “imperfect-information game”), it also involves multiple players and complex victory outcomes. The game of Go famously has more possible board combinations than atoms in the observable universe, making it a huge challenge for AI to map out what move to make next. But all the information is available to see, and the game only has two possible outcomes for players: win or lose. This makes it easier, in some senses, to train an AI on.

A timeline of Pluribus’ training regime. “Limping” is one strategy used by some human players that the AI eventually discarded.
Credit: Facebook

Back in 2015, a machine learning system beat human pros at two-player Texas Hold ‘em, but upping the number of opponents to five increases the complexity significantly. To create a program capable of rising to this challenge, Brown and his colleague Tuomas Sandholm, a professor at CMU, deployed a few crucial strategies.

First, they taught Pluribus to play poker by getting it to play against copies of itself — a process known as self-play. This is a common technique for AI training, with the system able to learn the game through trial and error; playing hundreds of thousands of hands against itself. This training process was also remarkably efficient: Pluribus was created in just eight days using a 64-core server equipped with less than 512GB of RAM. Training this program on cloud servers would cost just $150, making it a bargain compared to the hundred-thousand-dollar price tag for other state-of-the-art systems.

Then, to deal with the extra complexity of six players, Brown and Sandholm came up with an efficient way for the AI to look ahead in the game and decide what move to make, a mechanism known as the search function. Rather than trying to predict how its opponents would play all the way to the end of the game (a calculation that would become incredibly complex in just a few steps), Pluribus was engineered to only look two or three moves ahead. This truncated approach was the “real breakthrough,” says Brown.

You might think that Pluribus is sacrificing long-term strategy for short-term gain here, but in poker, it turns out short-term incisiveness is really all you need.

For example, Pluribus was remarkably good at bluffing its opponents, with the pros who played against it praising its “relentless consistency,” and the way it squeezed profits out of relatively thin hands. It was predictably unpredictable: a fantastic quality in a poker player. And it did it just by playing cards; there’s no element of machine vision or facial recognition incorporated into Pluribus to spot tells, for example.

Brown says this is only natural. We often think of bluffing as a uniquely human trait; something that relies on our ability to lie and deceive. But it’s an art that can still be reduced to mathematically optimal strategies, he says. “The AI doesn’t see bluffing as deceptive. It just sees the decision that will make it the most money in that particular situation,” he says. “What we show is that an AI can bluff, and it can bluff better than any human.”

What does it mean, then, that an AI has definitively bested humans as the world’s most popular game of poker? Well, as we’ve seen with past AI victories, humans can certainly learn from the computers. Some strategies that players are generally suspicious of (like “donk betting”) were embraced by the AI, suggesting they might be more useful than previously thought. “Whenever playing the bot, I feel like I pick up something new to incorporate into my game,” said poker pro Jimmy Chou.

There’s also the hope that the techniques used to create Pluribus will be transferrable to other situations. Many scenarios in the real world resemble Texas Hold ‘em poker in the broadest sense — meaning they involve multiple players, hidden information, and numerous win-win outcomes.

Brown and Sandholm hope that the methods they have demonstrated could therefore be applied in domains like cybersecurity, fraud prevention, and financial negotiations. “Even something like helping navigate traffic with self driving cars,” says Brown.

So can we now consider poker a “beaten” game?

Brown doesn’t answer the question directly, but he does say it’s worth noting that Pluribus is a static program. After its initial eight-day training period, the AI was never updated or upgraded so it could better match its opponents’ strategies. And over the 12 days it spent with the pro, they were never able to find a consistent weakness in its game. There was nothing to exploit. From the moment it started betting, Pluribus was on top.