Is “Big Data” a Big Danger to Humanity?
Advanced technology promises to modernize everything from communication to war, but math and machines are not immune to human bias.
This article was made possible because of the generous support of DAME members. During our Spring Member Drive, we urgently need your help to keep publishing. Will you contribute just $5 a month to support our journalism?
There’s a lot of hype about “big data,” a catchall term for using computers to analyze huge amounts of information in order to reveal patterns and predict outcomes—artificial intelligence, facial recognition software, and understanding computers so companies can better target ads. The result forms the basis for a halcyon sci-fi future: a place where machines use algorithms to crunch through massive troves of information and make objective decisions, free from messy human biases.
In reality, it doesn’t work out that way at all. Even the most complex algorithms and futuristic technology are programmed by humans, and we are prone to nothing if not bias. Big data has the potential to reduce bias in the legal system and to provide better data for scientific research, but only with strict transparency and regulation. Without that, big data can serve to further exacerbate institutionalized racism by covering it with a veneer of faux rationality and contribute to building a surveillance state. And that’s just the beginning.
The Criminal Justice System
To understand the appeal of using big data to replace human decision-making, one of the best places to start is with the criminal justice system, which has racism built into its very fabric. From laws such as “stop-and-frisk”—the practice of allowing police to stop any pedestrian and search them if they reasonably suspect they’ve committed or may commit, a crime—to the racial bias that drives arrests and sentencing for drug-related offenses.
If we took humans out of the equation, could we fix the racial bias that results in Black and Latinx people being stopped by police up to 80 percent more than Whites, and the fact that Black people are more than three times as likely to be arrested for marijuana possession, despite the fact that White and Black people use it at roughly the same rates? A 2018 Government Accountability Office report discusses risk assessment tools—A.I.-powered programs that examine a number of factors such as age, employment history, and prior criminal records—that can help determine whether a person will re-offend. One simulation showed that if such a tool was used, jA.I.l populations could be decreased by 42 percent, and we still wouldn’t see an uptick in crime rates.
That same report, though, highlights the risk of this approach: algorithms aren’t magically free from racism. They’re a set of rules developed by people, not machines, even though they’re designed to be used by machines. And people bring all their biases, explicit and otherwise, to the design of those rules. The GAO points out that law enforcement-related algorithms can actually increase racial bias because it’s drawing on data that is already comprised of biased information.
This isn’t a theoretical worry. It’s already happening. A computer-driven formula that tries to pinpoint whether someone will re-offend incorrectly decided that Black defendants would be future criminals at nearly twice the rate as with white defendants. The risk of re-offense figures into sentencing and probation decisions, which means those incorrect decisions are fueling longer and harsher sentences for Black defendants. Another problem is that sometimes the risk assessment tool itself works poorly overall, rather than just having racist underpinnings: ProPublica looked at 7,000 risk assessment scores in one county in Florida. Of all of those predicted to go on and commit a violent crime, only 20 percent actually did.
This isn’t to say that big data couldn’t make a positive difference in the criminal justice system—it could. At a minimum, that would require changing traditional assumptions about what makes a person a risk for re-offense. Focusing on factors like whether someone has a job or grew up in an unstable environment is functionally criminalizing people for the endemic effects of poverty, and those factors shouldn’t form the basis of an assessment of re-offense risk. Additionally, we need to stop using for-profit proprietary tools. Those tools perform calculations to determine risk scores, but private companies won’t say what those calculations are or how they work. Defendants should have a clear idea of how their risk factors are assessed.
The Surveillance State
The issues with the use of big data and A.I. by the criminal justice system aren’t just limited to systemic racism. A.I. is already helping police and governments massively expand the surveillance state. And you can thank the private sector for that. Take Amazon, which developed a facial recognition tool called Rekognition. They’ve passed along that tool to law enforcement, which brings up some serious privacy concerns. The ACLU, in a letter asking Amazon to stop selling the tool to law enforcement, compiled a list of the most chilling language:
“Amazon offers a ‘person tracking’ feature that it says ‘makes investigation and monitoring of individuals easy and accurate’ for “surveillance applications.’ Amazon says Rekognition can be used to identify ‘all faces in group photos, crowded events, and public places such as A.I.rports’—at a time when Americans are joining public protests at unprecedented levels. Amazon also encourages the use of Rekognition to monitor ‘people of interest,’ rA.I.sing the possibility that those labeled suspicious by governments—such as undocumented immigrants or Black activists—will be targeted for Rekognition surveillance.”
When pressed about how bad this sounds, Amazon washed its hands of the matter, simply saying they’ll cut off access to the tool for anyone who violates their terms of service. But when you specifically sell organizations on the use of your tool by encouraging them to surveil people, it doesn’t seem like surveillance would then violate any term of service.
Amazon also says that the use of Rekognition means that it can store image and video of the users the service generates, which means your data isn’t just living with the police department—it’s living with a giant for-profit corporation. Facebook has already also built a massive facial recognition database, and other companies are getting in on the act too. Axon—the corporation that makes Tasers—is marketing its body cameras to police departments by letting them use the cameras free for a year. What does Axon get out of the deal? Tons of video footage that helps it build its A.I. database which, in turn, can help police more effectively target people. Eventually, police would be able to use their body cameras in conjunction with real-time facial recognition software, which means that police could, for example, identify everyone at a political protest. But they’ll also misidentify people. Existing facial characterization algorithms, which perform the far simpler task of merely characterizing people via age, gender, and race, have high error rates when examining the face of darker-skinned people.
It’s tough to draw a line between the semi-benign uses of facial recognition, such as Facebook figuring out which friends are in your timeline photos, and the police using it to more efficiently track people. Unless we can draw incredibly strict boundaries around who can use facial recognition and why—perhaps for situations like AMBER Alerts only, for example—we’ll continue to see it exploited by the surveillance state.
It’s no secret that the military would love to increase drone strikes. In fact, drone strikes in Somalia doubled during Trump’s first year, while drone strikes in Yemen tripled. Getting better at drone strikes requires big data because drone cameras take in an enormous amount of footage—too much for human eyes to review. But when computers are able to crunch through massive databases of faces and places, they can get better at helping the military target individuals and specific locations.
Google signed on to help the Department of Defense with Project Maven, which is trying to better identify objects in drone footage. Google’s technology helps military analysts by using machine learning to comb through all the footage and figure out how to identify things such as vehicles. It then flags those images for review by a human. Project Maven also helps the military track individual people.
Google has tried to explain that its involvement is “non-offensive” in nature—that it isn’t related to helping the Pentagon directly improve ways to kill people. However, the Department of Defense has already said that it uses its video analysis in counterterrorism operations, which definitely includes things like drone strikes. It’s impossible, really, to say you’re helping the military with technology related to drone strikes but that you don’t see that as part of the business of war.
As with the issues surrounding the use of facial recognition in the surveillance state, it is tough to draw a line here. Google’s technology can efficiently and quickly separate the signal from the noise, which has practical and non-military uses in terms of mapping a location. But if companies like Google are willing to sell that technology to the military, it becomes complicit in the military’s goal of finding better ways to kill people.
Well, you could have predicted that ICE would be eager to use this sort of technology. The organization is already using a nationwide license plate recognition database, which gives its agents real-time tracking capabilities. The data set they’ve licensed has more than 2 billion license plate photos gathered from other big data sources like car repossession records and police databases. It’s a perfect example of how big data accretes and forms more big data, all with dangerous results.
Your kids could soon be tracked by the same facial recognition technology that police are eager to deploy. A school district in New York has already spent millions to buy a security system connected to a multi-camera surveillance network. In theory, it would be used to identify school visitors and make sure they’re not a threat, but there’s just no reason it couldn’t be used to flag and track students.
The horizon isn’t entirely bleak, though. There’s a bill making its way through the New York City Council that would require the creation of a task force that would provide recommendations about the use of automated decision-making tools. It’s the barest of beginnings, but at least it’s a start. It could be the first step in breaking what big data ethics researchers call the “black box”—the fact that proprietary big data and A.I. software keeps their inputs, outputs, and mechanisms secret. We don’t know what data goes in, and we don’t know why it comes out the way it does.
When big data software is transparent and open, we can figure out if the inputs are racially biased or if the way in which the program analyzes the data is somehow flawed. But making that happen requires us to understand that demanding machines and math fix human error both excuses us from confronting our own biases and accidentally enshrines those biases in our decision-making. Only when we work hard to eradicate our explicit and implicit biases will we see useful data outcomes. We also need to acknowledge that use of big data is going to require firm and non-partisan regulation to ensure that it is used wisely. Technology can’t solve all our problems, especially when we are the problem.
Before you go, we hope you’ll consider supporting DAME’s journalism.
Today, just tiny number of corporations and billionaire owners are in control the news we watch and read. That influence shapes our culture and our understanding of the world. But at DAME, we serve as a counterbalance by doing things differently. We’re reader funded, which means our only agenda is to serve our readers. No both sides, no false equivalencies, no billionaire interests. Just our mission to publish the information and reporting that help you navigate the most complex issues we face.
But to keep publishing, stay independent and paywall free for all, we urgently need more support. During our Spring Membership drive, we hope you’ll join the community helping to build a more equitable media landscape with a monthly membership of just $5.00 per month or one-time gift in any amount.