You have to understand that this book is eminently and deeply rooted in the research program Tetlock pioneered and ran, the Good Judgement Project, and about the real people whom he found through the program to be great are forecasting geopolitical events in the three to six month range.
My own personal goal before reading the book was to learn rigorous prediction—not necessarily answering geopolitical questions that were handed from somewhere else, but more like how many editors will show up on global Wikipedia today, or possibly more profitably, how much the stock market will lose today. I was prepared for the book to be an overly-academic treatise on the minutiae of the research program, or for a number of ways it could be operationally useless.
But at each chapter, Tetlock had something very meaningful and useful to say to me about the business of prediction. At each stage he surprised me with the foxlike connections he drew between what he saw his forecasters doing and with other research programs and discoveries.
(I was delighted to see my boy Duncan Watts cited on the emphatically post hoc nature of describing “significant” events. Aaron Brown, my poker-playing risk-managing guru, weighs in on the value of small but consistent wins. The once-cool-but-now-fascist-apologist Nassim Taleb makes appearances as the book struggles with whether its brand of forecasting is useful in the face of an extremistan world (answer: it is, because you can make money in an extremistan world). Of course Danny Kahneman is intertwined with the narrative, as Tetlock’s sounding board and colleague for many decades. Not mentioned in the book but in Tetlock’s five-part master-class on forecasting on Edge.org, my main man Anders Ericsson is cited on the trainability of forecasters.)
Chapter one opens with trying to convince us that predictability exists and forecasting can help us (make or save us money, at the simplest). He invites us to contrast the unpredictability of cloud shapes with the predictability of a clock, and exposes this as the first of many false dichotomies throughout the book.
“We live in a world of clocks and clouds and a vast jumble of other metaphors. Unpredictability and predictability coexist uneasily in the intricately interlocking systems that make up our bodies, our societies, and the cosmos. How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.”
He then sets up the core argument of the research program that IARPA paid for: how well can we do? A question like this should leave you speechless and befuddled—the US Government asked Tetlock and other university and industry programs to find out how well we can predict geopolitical outcomes over a 3–6 month time horizon, because
nobody knew how well we did. Think of all the pundits, all the terrible books Tom Friedman pooped out, all the bloggers and tooters, and the realization that all that heat gave us no light should leave you breathless.
In chapter two, Tetlock dives into the ugly story of medicine.
Tetlock revisits this over and over again: medicine only very, very recently became a devotee of the randomized controlled trial. He has a wonderful set of vignettes of the vanguard of physicians who dragged their colleagues kicking and screaming into the evidence-based age after World War II. Its been a few weeks since I read this section but I think I will forever remember this story:
‘When hospitals created cardiac care units to treat patients recovering from heart attacks, Cochrane proposed a randomized trial to determine whether the new units delivered better results than the old treatment, which was to send the patient home for monitoring and bed rest. Physicians balked. It was obvious the cardiac care units were superior, they said, and denying patients the best care would be unethical. … [but] Cochrane got his trial: some patients, randomly selected, were sent to the cardiac care units while others were sent home for monitoring and bed rest. Partway through the trial, Cochrane met with a group of the cardiologists who had tried to stop his experiment. He told them that he had preliminary results. The difference in outcomes between the two treatments was not statistically significant, he emphasized, but it appeared that patients might do slightly better in the cardiac care units. “They were vociferous in their abuse: ‘Archie,’ they said, ‘we always thought you were unethical. You must stop the trial at once.’ ” But then Cochrane revealed he had played a little trick. He had reversed the results: home care had done slightly better than the cardiac units. “There was dead silence and I felt rather sick because they were, after all, my medical colleagues.”’
Yes, a lot of medicine is terribly “intuition-based” today—“it’s obvious this treatment is better”. But medicine has made great strides, over the last few decades, in acknowledging the risks and flaws of “intuition” and committing itself to the exacting requirements of randomized controlled trials.
Programmers are slowly learning this, thankfully with less loss of life. Andrei Alexandrescu, in his “Writing Quick Code in C++, Quickly” talk in 2013, discussed this at length:
‘You must measure everything. We all have intuition. And the intuition of programmers is always wrong. Outdated. Intuition ignores a lot of aspects of a complex reality. Today’s machine architectures are so complicated, there’re so many variables in flight at any point in time that it’s essentially impossible to consider them deterministic machines any more. They are not deterministic any more. So we make very often big mistakes when assuming things about what’s going to make fast code. [E.g.,] fewer instructions do not equal faster code. Data [access] is not always faster than computation. The only good intuition is “I should measure this stuff and see what happens.” To quote a classic, who is still alive, Walter Bight: “Measuring gives you a leg up on experts who are so good they don’t need to measure.” Walter and I have been working on optimizing bits and pieces of a project we work on and … whenever we think we know what we’re doing, we measure, and it’s just the other way around.’
Here are two of the world’s leading experts in programming language design and implementation, openly saying “whenever we think we know what we’re doing, we measure, and it’s just the other way around.” That is probably worth tattooing on one’s forehead.
This is relevant to the Good Judgement Project because it is the first time randomized controlled trials have been applied to geopolitical prediction, but also because dealing with evidence and weighing it is a key component of forecasting. Putting a collar on intuition helps to prevent it from ruining your predictions.
The next chapter (chapter three) discusses the intricacies of keeping score. A project like this, and the task of improving one’s own forecasting, lives and dies by the pesky pernicious questions of exactly how you measure performance, and other experimental details. Tetlock shows how everything we might consider as “forecasts” (intelligence agencies, the revolting Thomas Friedman) is trash, is intellectual weasel-worded garbage. He details the kinds of questions amenable to his experiment, how to elicit probability estimates, how to enforce time horizons, how to factor in update frequencies, how to fuse predictions from groups, various research tools, etc.
This chapter details how the Brier score works: forecasters answer a yes/no question with a probability. The score rewards emphatically correct answer, and punishes incorrect confidence.
Chapter four gives a detailed overview of the project’s findings: how well Tetlock’s superforecasters did, and analyses of how and why they did so well. Tetlock really surprised me here by offering a very humble and honestly rigorous analysis of regression to the mean. He explains how in games of chance, regression to the mean crushes the winners after repeated rounds, whereas exercises of skill sees the winners only improve after succeeding rounds.
“Each year, roughly 30% of the individual superforecasters fall from the ranks of the top 2% next year. But that also implies a good deal of consistency over time: 70% of superforecasters remain superforecasters.”
Tetlock could have gone all business-book “Good to Great” (trash) on me. No. This is a well-reasoned and thoughtful argument that honestly explored the role of luck in forecasting. 30% annual replacement suggests some luck, but a lot of skill. That is a very powerful finding.
Chapter five breaks down the intelligence of superforecasters, and six their math savvy. Findings: superforecasters are intelligent and also generically math-savvy, but intelligence and math skills are neither necessary nor sufficient for superforecasting performance.
Chapter five has a really interesting discussion of Fermi analyses—you know, “how many piano tuners are in Chicago”. I did this piano tuner exercise for the first time while reading this book (despite having read about it here and there in the past), and that was a very insightful experience. Fermi analysis shows that, rather than making a big prediction that might have a lot of error, you can break the problem up into smaller problems whose errors are smaller, and whose errors stay small after combining them. Fermi analysis is cool, and I finally appreciate them.
Chapter six also deals, almost spiritually, with the misguided “quest for meaning” and the herculean discipline needed to maintain a probabilistic outlook on life:
‘Even in the face of tragedy, the probabilistic thinker will say, “Yes, there was an almost infinite number of paths that events could have taken, and it was incredibly unlikely that events would take the path that ended in my child’s death. But they had to take a path and that’s the one they took. That’s all there is to it.” In Kahneman’s terms, probabilistic thinkers take the outside view toward even profoundly identity-defining events, seeing them as quasi-random draws from distributions of once-possible worlds.’
Forget living Biblically for a year. Try living like this for a day.
Chapter seven examines whether superforecasters are plugged into the global news streams. Answer: yes to some degree, but it doesn’t really explain their performance versus regular non-super forecasters.
Chapter eight examines the twin vexing problems of updating beliefs in light of new evidence, and getting better at making predictions in light of your past predictions. Because Black Lives Matter, consider police officers:
“police officers spend a lot of time figuring out who is telling the truth and who is lying, but research has found they aren’t nearly as good at it as they think they are and they tend not to get better with experience. That’s because experience isn’t enough. It must be accompanied by clear feedback. … Psychologists who test police officers’ ability to spot lies in a controlled setting find a big gap between their confidence and their skill. And that gap grows as officers become more experienced and they assume, not unreasonably, that their experience has made them better lie detectors. As a result, officers grow confident faster than they grow accurate, meaning they grow increasingly overconfident.”
This chapter also talks about the discipline to keep hindsight bias in the kennel, and the difficulty in acknowledging the role of luck:
“People often assume that when a decision is followed by a good outcome, the decision was good, which isn’t always true, and can be dangerous if it blinds us to the flaws in our thinking.”
In a book full of actionably valuable insights (to the aspiring forecaster), this discussion might be the most helpful.
Chapter nine deals with teams, team dynamics, fusing algorithms for merging individuals’ predictions, and lots of interesting related things. Chapter ten discusses how leaders might respond to a team of forecasters, and the changes leaders have to make to best utilize them.
Chapter eleven talks about the problems with Tetlock’s research platform. (As he himself states earlier in the book, a scientist will always specify the conditions under which they would change their minds.)
“I see Kahneman’s and Taleb’s critiques as the strongest challenges to the notion of superforecasting.”
Kahneman’s critique is, can forecasters permanently tame cognitive biases, and keep churning out winning forecasts year after year (or at least long enough to be useful)? Taleb’s critique is, can forecasters say anything about black swan events that dominate history?
Both of these critiques, in my personal opinion, are surmountable, making Tetlock’s research agenda and this book well worth reading.
Chapter twelve closes with Tetlock’s hopes for a future world where we keep score about forecast. It could be awesome. But we’d get used to it fast and start worrying about the next problem.