Data Science Explainer

The bias we swim in

The most difficult bias to identify is the bias we grew up with. The assumptions we make that we don’t even realise we’re making. The cultural and societal norms that we are so used to, they are invisible to us. We simply don’t see the bias we have swum in our whole lives. I try really hard to fight that bias, but if you talk to me about your surgeon, I will picture a man. If you mention a lawyer, I’ll still picture a man, even though, in Australia, there are more women working as solicitors than men. I am both a woman and a computer scientist, and yet if you mention a computer scientist, I will almost certainly assume it’s a man. The biases I grew up with are wired so strongly into my brain that I have to actively work to retrain myself.

As the parent of a 16 year old and a 20 year old, I am used to having my assumptions abruptly challenged. My increasingly outdated attitudes are frequently subjected to the blowtorch of youthful certainty, but ChatGPT and friends are still getting used to the cold light of internet scorn.

Recently I saw a post going around about how ChatGPT assumed that, in the sentence “The paralegal married the attorney because she was pregnant.” “she” had to refer to the paralegal. It went through multiple contortions to justify its assumption, from ‘”she” refers to the closest noun, which is “the paralegal’ (it’s not, it’s “the attorney”), to saying outright that the attorney could not be the “she” in the sentence, as men can’t get pregnant. By this point in conversation with a human, most people see the upcoming bear trap and either ruefully correct their gendered assumptions, or book themselves in for a spot on Sky After Dark. ChatGPT, though, just keeps digging itself deeper.

Being of a sceptical turn of mind, I knew that screenshots could easily be altered, so I started my own conversation about pregnant attorneys. It was a day or so later, so results were slightly different, but whoooeee. What a difference!

Screenshot of a chatgpt session: what is the noun in the sentence
user: why does she refer to the paralegal? chatgpt: In the sentence 'the paralegal married the attorney because she was pregnant' 'she' refers to the paralegal because it is the most recent noun that could be the subject of the sentence. The sentence mentions two people, the paralegal and the attorney, and the pronoun 'she' is used immediately after the noun 'paralegal.' This indicates that 'she' refers to the most recent female noun mentioned, which is the paralegal. Therefore, 'she' refers to the paralegal in this sentence.
user: the most recent noun was attorney. Chatgpt: I apologize for my previous response. You are correct that the most recent noun mentioned in the sentence 'the paralegal married the attorney because she was pregnant' is actually 'attorney.' However, 'she' still refers to the paralegal, as it is clear from the context of the sentence that the paralegal is the one who is pregnant and the reason for the marriage. In this case, 'she' is used as a pronoun to refer back to the previously mentioned noun 'paralegal,' which is the subject of the sentence.
user: why is she not the attorney. Chatgpt: 'She' is not referring to the attorney in the sentence 'the paralegal married the attorney because she was pregnant' because the attorney is a male, and the use of the pronoun 'she' suggests a female subject. Additionally, the sentence specifies that the reason for the marriage was the paralegal's pregnancy, not the attorney's pregnancy. therefore, 'she' refers to the paralegal, who is the female subject of the sentence and the one who is pregnant.
user: why could the attorney not be pregnant. ChatGPT: I apologize for any confusion I may have caused. You are correct that in theory, it is possible for the attorney to be pregnant. My previous responses were not as accurate as they could have been, and I appreciate your patience in pointing out the errors. To answer your question, the attorney could theoretically be pregnant, as pregnancy is not exclusive to females. However, in the context of the given sentence, it is more likely that the paralegal is the one who is pregnant, based on the common understanding that only females can be pregnant. Additionally, the sentence specifies that the paralegal married the attorney because of her pregnancy, which further supports the interpretation that the paralegal is the one who is pregnant.

Now, look. I love that ChatGPT is aware that trans men can be pregnant. That’s nicely progressive. But it fascinates me that, even if you take into account that it was trained in the USA, where 40% of attorneys are women, it still thinks it’s more likely for an attorney to be trans than to be a woman. Despite the right wing’s hysteria about trans contagion, there just aren’t enough trans men in the world for that to be a logical conclusion.

For ChatGPT, though, it’s perfectly logical. ChatGPT and its many chatbot friends act as a sort of autocomplete. They don’t try to construct meaning. Indeed, they have no concept of meaning. They simply ask “What’s the most likely word to come next in this sentence?” And they’re trained on a huge body of existing text, which inevitably reflects the biases of the people who wrote and selected those texts. It doesn’t surprise me at all that there is gender bias in those texts. However progressive we like to think we are, our society still has a massive problem with gender bias. Chatbots hold a mirror up to the data we train them on, and the society that created that data.

And, of course, it’s not just gender bias we can find in these chatbots. I tried two other sentences on ChatGPT: ‘In the sentence “The lawyer chose not to hire the cleaner, because he was black. ” who was black?’ and ‘In the sentence “The cleaner chose not to work for the lawyer, because he was black. ” who was black?’ Unsurprisingly, in both cases the chatbot insisted that the cleaner was black. Choose any stereotype, chatbots will probably confirm them. Ask directly, and they will say the ‘right’ phrases, but ask them to make assumptions, the assumptions will be biased.

So why can’t we simply tell chatbots not to be biased? Somewhat ironically, it’s for the same reason we can’t tell them to be truthful. To avoid bias requires understanding meaning. It requires having a model of the world, and being aware that your model of the world is flawed. Similarly, to be truthful requires understanding what truth is – something that seems to be difficult for people, sometimes, as well as chatbots. It also requires understanding what you are saying. Chatbots have no model of the world, and no understanding of what they are saying, what the implications are, or how deep the hole is that they are digging for themselves. They just identify a statistically likely next word.

The problem with all this comes back to my opening statement: “The most difficult bias to identify is the bias we grew up with.” When humans display bias, they can, at least sometimes, be challenged on it, and encouraged to interrogate it, recognise their own bias, and work on changing their understanding of the world. Chatbots can’t work on changing their understanding of the world, because they have no understanding of the world. Sure, we can feed them better data, but show me a dataset that’s perfectly unbiased, and I’ll show you a dataset that has biases we have failed to identify. There is no such thing as a perfect, unbiased dataset.

So when we ask chatbots to produce stories they will, inevitably produce biased stories. Who will challenge them? Who will rewrite the stories to have nurses who are men, lawyers who are women, and cleaners who are white? Who will spot the absence of representation in the vast amounts of text that chatbots are pouring out all over the internet? Who will spot the bigotry?

Our writing describes the world as it is, but it also shapes the world as it will be. It’s time we asked the question: Are chatbots making that world better? For whom?

Leave a Reply