As a young military psychologist, Daniel Kahneman was assigned to observe soldiers during a team obstacle-course exercise and rate their potential for officer training. He and his colleagues usually had no problem agreeing on scores: soldiers who took charge of the group under pressure got high marks, while anyone too submissive or unhelpful was written off as a bad candidate for a leadership role.
The only problem with this system for evaluating talent was that it didn’t work.
During feedback sessions, raters learned that their scores had virtually no correlation to the soldiers’ later performance at officer training. But even confronted with statistical proof that his opinion didn’t matter, Kahneman still felt sure of the conclusions he drew each day on the obstacle course: this guy will never make it; that one will be a star.
“Any other prediction,” he wrote in his 2011 book Thinking, Fast And Slow, “seemed inconsistent with the evidence before our eyes.”
Over the course of a Nobel laureate career studying the quirks of human judgment, Kahneman uncovered other examples of what he dubbed the “illusion of validity”. He has cited one paper that found evidence of reliable expertise in certain jobs, such as livestock judges and insurance analysts, but not in others, including psychologists, stockbrokers, court judges and personnel selectors.
In Michael Lewis’ 2003 book Moneyball, Kahneman read about a profession that belonged squarely in the unreliable group: baseball scouts. Veteran scouts often missed badly on their player projections by focusing on the wrong traits, but without much feedback to go on, they kept trusting their gut. “People, including experienced professionals,” Kahneman wrote, “sometimes have subjectively compelling intuitions even when they lack true skill.”
In the post-Moneyball era, baseball has had a public reckoning with its scouts’ historical track record.
Website The Ringer’s review of 13,000 amateur player reports from one Major League club found that, on the whole, scouts’ old ratings of potential were “only a little bit better for prognostication purposes than throwing darts at a draft board (list of players)”. In recent years, baseball teams have overhauled their recruitment to make scouting more rigorous and better integrated with growing data analytics departments.
Football hasn’t held its scouts up to the same kind of public scrutiny. Every once in a while an old player profile will get fished out of the archives for fun (a young Ronaldinho is correctly described as “very technical” and “not a real target man”; a youthful Xavi “runs on his heels, his movements are slow and he has trouble picking up speed”), but there is no open source of thousands of scouting reports for researchers to analyze. It’s accepted that even the most expensive signings amount to little more than a coin toss, yet the expertise behind those signings is rarely studied.
As a result, we don’t have good answers to some basic questions that affect billions of dollars in spending every year. Does football scouting even work? How do we know if it does or doesn’t? And how could it be — measurably, systematically — made better?
A few years ago, while doing some work on football scouting in the Netherlands, a Dutch graduate student named Tom Bergkamp read an academic paper in which Kahneman and a colleague tried to explain what made human intuition useful in some settings but not others. The authors agreed that experts need valid cues — indicators that can meaningfully inform their judgment — and good feedback to help them learn how to use those cues.
“They talk about baseball not being a sport which meets these requirements,” Bergkamp said. “So I’m thinking, ‘If baseball is already not a sport that meets these requirements, then soccer is definitely not a sport that meets these requirements, because soccer is much more dynamic and noisy than baseball’.”
One challenge is figuring out which attributes scouts should pay attention to. “Not every good passer, for example, becomes a pro player, right?” said Bergkamp. “Some good passers do, but some don’t — because it’s so difficult, the interaction between all these variables. So passing is not a valid enough cue.”
Another obstacle is that, at least for scouts trying to project how young prospects will turn out, it may be years before they learn what they got right or wrong. “The feedback loop is very long,” Bergkamp said, “and because it’s very long, it’s hard for decision-makers — scouts, coaches et cetera — to learn the cues that have a high validity.” Maybe dribbling skill tells you more about a player’s future than passing, but it could take an entire scouting career to find out.
Without good cues and feedback, even experienced professional predictions may not be very accurate.
Research from the workplace suggests it can sometimes help to replace intuition with a more structured assessment. For example, those hiring managers who score applicants on specific skills instead of making a gut decision tend to see better results.
Some of the Dutch scouts Bergkamp interviewed for his work rated players in a structured way, but experienced coaches often say they simply know a future pro when they see one. Liverpool’s longtime chief scout Geoff Twentyman, who estimated his success rate at 80 per cent, would watch a player for an entire game and leave with little more in his notes than: “A good prospect. Worth watching.”
According to Jon Cotterill, the author of Anatomy Of A Football Scout: An In-Depth Look At Player Recruitment, the method and format of a player report are usually left up to the scout. “Clubs rarely do it, which is really unfortunate, but one club was good enough to send me a one-page template,” he said. “They had an A, B, C, D system, and then they had a guide that said A means this, B means that…”
Based on the studies of hiring managers, Bergkamp believed that giving scouts a structured scorecard like the one described above would make their player assessments more reliable than scouts who used a holistic, intuitive method. Working with a team of researchers from the University of Groningen in his homeland, Bergkamp recruited about a hundred scouts and coaches affiliated with the Dutch Football Association and professional clubs to participate in a study that, he hoped, would point to a better way to scout.
The professional raters were asked to imagine they were trying to find a young full-back for a mid-table club in the Eredivisie (Dutch football’s top division). They would watch half an hour of Wyscout clips of one random player from a pool of 25, rate his performances and make a prediction about how he would develop.
Some of the raters were simply asked to grade the player’s overall performance in the clips they watched on a seven-point scale. Others were given a list of eight tasks relevant to the full-back position, such as whether the player was “available to stop the counter, apply pressure, and retain compactness” during defensive transitions. Raters in this structured group graded the player on each task, then gave their overall score just like the unstructured group.
When his team analyzed the scores, Bergkamp was taken aback by what they showed. “‘Surprising’ is the word, I would say,” he said, when asked to describe the results.
Raters who gave only an overall score could barely agree with one another on how well a player had performed, as the researchers might have expected. But the grades from the group that used a carefully designed scorecard disagreed even more. Even if you ignored the second group’s overall ratings and averaged their eight specific task scores together instead, there was little consensus among the scouts on what they had seen.
Cotterill, an experienced scout himself, wasn’t surprised at all to hear that raters in the study disagreed. “I think it’s quite common, even for the top-level players,” he said. “The reason why it’s so hard is that there is no magic formula.”
For Bergkamp, though, the disagreement was troubling.
“In order to make valid predictions, you need reliable predictions. You need people to be on the same page,” he explained. “I can see the argument — ‘Isn’t it good that every scout brings something else to the table?’ — but if they all see something completely different and come to different assessments, how are you going to make a decision on that player?”
Raters fared a little better in the second part of the study, which had to do with projecting how players would develop.
For this part, they were given background information about a player’s age, number of games played, and value according to the website Transfermarkt — a crowdsourced number that Bergkamp said studies have shown to be “a relatively good indication of how players perform on the pitch” — at the time of the Wyscout footage they had watched. Then they were asked to predict the same player’s value three seasons later.
Scouts and coaches did a decent job of predicting future values for the players whose clips they had graded. But researchers discovered that raters could predict market values almost as well for players they hadn’t even watched, using nothing but the general background information. The most important predictor of a player’s future value, it turned out, wasn’t how scouts judged his clips — it was his past value.
As far as statistical tests were concerned, the researchers conceded, “we found no evidence that observing soccer players in games hurt or helped validity”. They couldn’t prove that it mattered whether scouts watched players at all.
For Kahneman, who has spent a lifetime documenting how biases and noisy judgment can lead to failures like his overconfident obstacle course evaluations, findings like these wouldn’t come as a surprise. He often recommends replacing fallible human intuition with algorithms for more consistent results.
In baseball, where sophisticated new technology can do parts of a scouting department’s job more cheaply and reliably, some teams have done just that.
Evan Drellich’s new book Winning Fixes Everything tells how the Houston Astros drastically cut their scouting operation after finding that adding scouts’ opinions to the data they had on pro players made their evaluations “about two per cent better”.
“At some point in recent history,” one Houston executive said, talking about an analytical tool called TrackMan, “a fucking box up on the awning of the press box is doing a better job covering the majors and minors than scouts can.”
Football is nowhere near that point yet. Even the most data-forward sporting departments, such as Brentford’s, still use traditional scouts to evaluate players, including questions about their character. Under Michael Edwards, Liverpool famously recruited a whole stable of PhDs to do cutting-edge analytics work, but they still sent out scouts to grade players on an A-through-D rubric.
A common process these days is to use data scouting to sift through thousands of players and produce a list of prospects a club might be interested in, then video scouting to trim the list down, before finally sending live scouts to watch the most interesting players perform in the stadium. Each of these types of scouting has strengths and weaknesses, and they work together, not in opposition, to offset those weaknesses.
Traditional scouts feel what the players feel. “If you haven’t been to the places, if you haven’t been to (leading Brazilian club) Fluminense’s training ground for example — the grass there is really, really long. And it can be really hot this time of year,” Cotterill explained. “So if you’re not aware of the conditions, you don’t realize how it affects the game.”
Unlike most data and video, live scouts in the stadium can see the whole game. On video platforms, Cotterill said, “you can’t see what is happening off the ball, and that is as important as what is happening on the ball.” Scouts who show up in person can also talk to people around the club to get a feel for the intangibles that can separate a talented player who’s ready for a step up and one who washes out.
“Consistency,” Cotterill admitted, “is the big problem.” There may not be anyone keeping track of scouts’ track records or giving them the kind of structured feedback that Kahneman believes is necessary for reliable expert predictions. All they’ve got is their gut and their reputation.
Bergkamp doesn’t think his research proves that scouting doesn’t work. In fact, he’s optimistic that with more research and a better design, the structured scorecard approach — the worst performer in his experiment — can still help improve scouts’ judgment.
“If I was in charge of a club, I would make it as structured as possible. The whole describing a player in vague terms and giving a gut feeling or intuition, I would scrap all that,” he said.
The important thing is making sure we know what scouts are and aren’t doing well.
“I would be doing these kinds of exercises all the time,” Bergkamp said. “What are my scouts assessing? How well are they actually aligned?”
He rolled his eyes in frustration at his imaginary sporting director job.
“If I have a scouting apparatus and all of my scouts are reporting vastly different things on the same player — again, how can I make decisions then?”
(Top photo: Xavi became one of the greatest Barcelona players despite iffy scouting reports as a youngster. David Ramos/Getty Images)
Read the full article here