Data is interesting because it can provide answers, it can clear the path to the future, and it can resolve the past. But it can also be dangerous, and terrifyingly so. It can be wielded for evil, can be used to perpetuate injustice, and it can be used to further confuse people. I should know, I spent more than half a decade in grad school trying to use data to prove complex theories. More people are sounding the alarm, but I don’t think people are listening enough. It’s why years ago when I came across this book by a mathematician, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, I was intrigued. It was also the reminder I needed to talk about it on the blog. Yes, people, sometimes blog posts on this blog take years in the making.
What makes data so dangerous is that people tend to believe it. How many times have you heard “studies show…” without even questioning it; without asking how was this study done? What was the sample size of the survey? What statistical methods were used to analyze it? We automatically assume data and subsequently, algorithms, are neutral. It is exactly this trust that users and creators of the weapons of math destruction, as Cathy O’Neil calls them, depend upon. To be clear, not all data is awful, and we must have some level of trust in our scientists and social scientists. That trust is integral to our social fabric. However, and more specifically, O’Neil draws our attention to algorithms in this age and how they further divide us and drive inequality among us. Who better to talk about this? She is a mathematician, former professor, and former Wall Street banker.
Increasingly, the decisions that affect our lives—where to go to school, whether we get a car loan, how much we pay for health insurance—are being made not by humans, but by mathematical models. It is such models she calls weapons of math destruction. People building WMDs routinely lack data for the behaviors they are most interested in. So, they substitute stand-in data or proxies. For instance, they may draw statistical correlations between a person’s zip code or language patterns and her potential to pay back a loan or handle a job, and then make inferences with dire consequences from this. Now, I have to say there is a reason we have statistical significance, and it helps us to reduce the likelihood that the result from our analysis is due to mere chance. But even this has been so abused, it has greatly reduced our ability to think. An arbitrary number, a p-value of 0.05 (which researchers will tell you is the holy grail when publishing), should not be the basis from which life altering decisions are made. We have now devoted our resources and time to one thing alone: we spend more of it on statistical software and less actually thinking.
Models are so often simplified (but the statistical analysis and software so complex) that it’s incredibly difficult to include real world’s nuance and complexity, and rarely do the humans behind them account for information that gets left out. The truth is, when creating models, we have to make the choice about things like what’s important to include, what we have data for, what data can even be collected for. Because of this, we ultimately simplify things. And every single one of us data analysts, data scientists, social scientists, scientists, statistician etc. are guilty of this.
To demonstrate the havoc WMDs can wreck, O’Neil takes us through real life examples in education, job search, universities, the criminal justice system, the 2008 financial crisis, and elections. For instance, in Washington, D.C., algorithms were used to wrongfully fire brilliant and engaged teachers. They used a so-called “value added model” that evaluated teachers based on students’ test scores and completely ignored how teachers engage the students, work on specific skills, deal with classroom management or even help students with personal and family problems. It also ignores students’ own personal and familial problems, and the fact that these students exist within systems and structures that can be harmful.
In the criminal justice system, she cites the example of models used to determine recidivism, such as the LSI-R that includes questionnaires for prisoners to fill out. Questions about their lives that inmates from more privileged backgrounds would answer differently from someone from tough inner-city streets. For instance, “the first time you were ever involved with police” would be different for a white boy from a wealthy suburban Connecticut neighborhood than for a Black man for whom his blackness is already seen as harmful. It collates questions like this and uses it to determine recidivism and other important consequences like prison term. Yet, in 2013, the New York Civil Liberties Union found that although Black and Latino males between the ages of 14 and 24 made up only 4.7 percent of the city’s population, they accounted for 40.6 percent of the stop-and-frisk checks by police. More than 90 percent of those stopped were innocent. Of those who weren’t; perhaps drinking underage or carrying a joint; know that while they got in trouble for it, rich white kids didn’t. So, if early “involvement” with the police signals recidivism, we already know who seems more risker: poor people as well as Black and Latino people. This is the fallout from just one of such questions. Others carry even greater burden. and yet these are used to derive a system to account for recidivism. After answering these questions, convicts are categorized as high, medium, low risk. This is not just, but more importantly, this is not fair.
And what about our elections? Talking about the role of Facebook and its mad algorithms on our elections (aka undermining our democracy) would make for an entire post itself. Facebook has access to data of billions of people and can (and HAS) use that information to influence people’s actions, more so in voting. That is too much power. But Facebook is not the only culprit. Google, Apple, Microsoft, Amazon all have tremendous power and information on much of humanity and they can steer us however they choose.
The above examples are just two of social issues plaguing our society. The role of math in unleashing the egregious financial crisis of 2008, for instance, is far more staggering.
What makes something a WMD? O’Neil lists three elements: opacity, scale, and damage. And that’s what the examples above all have in common. They lack transparency; they are used on a massive scale; the damage they cause is terrifying.
Despite a reputation for impartiality and objectivity, these models hardly are. They reflect goals and ideologies. It’s human nature. Our own values and desires influence things like the data we choose to collect and the questions we ask and our opinions about a lot. Why won’t we admit that? Why won’t we then approach these things with a lot more humility. Instead of deliberately wielded formulas to impress not clarify, a continuing problem in academia.
No comments