Proponents of AI and other optimists are often ready to acknowledge the numerous problems, threats, dangers, and downright murders enabled by these systems to date. But they also dismiss critique and assuage skepticism with the promise that these casualties are themselves outliers — exceptions, flukes — or, if not, they are imminently fixable with the right methodological tweaks.
Common practices of technology development can produce this kind of naivete. Alberto Toscano calls this a “Culture of Abstraction.” He argues that logical abstraction, core to computer science and other scientific analysis, influences how we perceive real-world phenomena. This abstraction away from the particular and toward idealized representations produces and sustains apolitical conceits in science and technology. We are led to believe that if we can just “de-bias” the data and build in logical controls for “non-discrimination,” the techno-utopia will arrive, and the returns will come pouring in. The argument here is that these adverse consequences are unintended. The assumption is that the intention of algorithmic inference systems is always good — beneficial, benevolent, innovative, progressive.
Stafford Beer gave us an effective analytical tool to evaluate a system without getting sidetracked arguments about intent rather than its real impact. This tool is called POSIWID and it stands for “The Purpose of a System Is What It Does.” This analytical frame provides “a better starting point for understanding a system than a focus on designers’ or users’ intention or expectations.”
That kind of analysis is done all the time. But, even if we can collect all the relevant data (big if), the methods required are difficult to interpret and easy to abuse (we can't do an RCT of being born female vs male, or black vs white, &c). A good example is the proliferation of analyses claiming that the gender pay gap does not exist (after you've 'controlled' for all the things that cause the gender pay gap).
It's not easy to do 'right' even when done in good faith.
The article isn't claiming that it is easy, of course. It's asking why power is so keen on one type of question and not its inverse. And that is a very good question, albeit one with a very easy answer. Power is not in the business of abolishing itself.
Isn’t that a continuation of “why the outlier was culled”?
More emphasis on how the data set is selected (while hard) is very useful
Not sure I follow, but I think the answer is "no".
If you control for all the causes of a difference, the difference will disappear. Which is fine if you're looking for causal factors which are not already known to be causal factors, but no good at all if you're trying to establish whether or not a difference exists.
It's really quite difficult to ask a coherent question with real-world data from the messy, complicated reality of human beings.
A simple example:
Women are more likely to die from complications after a coronary artery bypass.
But if you include body surface area (a measure of body size) in your model, the difference between men and women disappears.
And if you go the whole hog and measure vein size, the importance of body size disappears too.
And, while we can never do an RCT to prove it, it makes perfect sense that smaller veins would increase the risk for a surgery which involves operating on blood vessels.
None of that means women do not, in fact, have a higher risk of dying after coronary artery bypass surgery. Collect all the data which has ever existed and women will still be more likely to die from the surgery. We have explained the phenomenon and found what is very likely to be the direct cause of higher mortality. Being a woman just makes you more likely to have that risk factor.
It is rare that the answer is as neat and simple as this. It is very easy to ask a different question from the one you thought you were asking (or pretend to be answering one question when you answered another).
You can't just throw masses of data into a pot and expect sensible answers to come out. This is the key difference between statisticians and data scientists. And, not to throw shade on data scientists, they often end up explaining to the world that oestrogen makes people more likely to die from complications of coronary artery bypass surgery.
Maybe it’s a crude interpretation, but over controlling for all the the cause of a change, and removing outliers in your data that is training these AI models seem like similar issues when trying to actually understand the data
The data cannot be understood. These models are too large for that.
Apple says it doesn't understand why its credit card gives lower credit limits to women that men even if they have the same (or better) credit scores, because they don't use sex as a datapoint. But it's freaking obvious why, if you have a basic grasp of the social sciences and humanities. Women were not given the legal right to their own bank accounts until the 1970s. After that, banks could be forced to grant them bank accounts but not to extend the same amount of credit. Women earn and spend in ways that are different, on average, to men. So the algorithm does not need to be told that the applicant is a woman, it just identifies them as the sort of person who earns and spends like the class of people with historically lower credit limits.
Apple's 'sexist' credit card investigated by US regulator
Garbage in, garbage out. Society has been garbage for marginalised groups since forever and there's no way to take that out of the data. Especially not big data. You can try but you just end up playing whackamole with new sources of bias, many of which cannot be measured well, if at all.
You are pointing out specific biases that we already know about. The article you posted seems to posit using the data to find the unknown biases we have as well
It's asking why don't we use it for that purpose, not suggesting that there is anything easy about doing so. I don't know how you think science works, but it's not like that.