The bias problem is common for all artificial intelligence applications – that these systems appear to work really well for some segments of the population, but really poorly for others. This can be a
significant problem because it erodes trust in these systems. Imagine your customer having an
interaction with an AI system, and it “doesn’t work” – who will they blame? Will they blame the
computer for being wrong, or the company for deploying a product that doesn’t work? Will they be
willing to give the system another chance, or will they write it off and try their luck somewhere else?
There are only two key places in an AI system where this bias can originate – the architecture of the
AI system, and the training data used by that architecture to form a model of the world. AI
architectures are increasingly complex, with millions of nodes and connections, and it’s quite
difficult for any human to understand exactly how these architectures work or which parts of the
architecture to tweak. So it’s much easier to treat it as a black box, and to just assume that it’s
probably doing as well as it can.
That leaves us with the training data, and this is where most people have put their efforts. More
data gives you a more comprehensive view of the world, because you are providing more examples
for the AI system to learn. Imagine showing a toddler hundreds of pictures of Sphynx cats and telling them “this is a cat” – would they then correctly identify a Norweigian Forest cat as a cat? The groups
that have been most successful in AI are also those with the largest amount of data – the big tech
companies or governments with millions or bilions of samples available. In the boom of AI-powered
computer vision, the big object detection competitions weren’t being won by smart research groups
at universities with better or smarter tech – they were being won by Google, Facebook, Microsoft,
Alibaba using really similar methods but with access to more data.
However, that data is valuable, so they typically aren’t going to release it publicly for free. It can be
costly to acquire more samples of the environment, and there is a limit to how much time and effort
companies are willing to spend. The recent controversy around Clearview, who breached the Terms
and Conditions of multiple social media sites to collect a billion-sample facial recognition dataset,
shows how difficult it is to otherwise legally build a truly large dataset.
There are some proposals that synthetic data (i.e. data that has been artificially constructed based
on existing datasets to try and make the dataset bigger) can help solve this problem. For example, if
we have a facial recognition system, deepfake technology could be used to help generate a wider
variety of faces, with a bit of randomness to help create new data points that previously did not
exist. I think the challenge with these proposals is that they are unlikely to eliminate bias – worse,
they may further hide the bias by pretending that it isn’t there. This is because the deepfake
technology is trained using a dataset as well, and if that underlying dataset is biased, then it
becomes encoded in the synthetic data too. Some have argued that you can control how the
synthetic data is generated to address specific biases. But if the designer already knows where those
biases are, then they should really invest the time and resources into collecting the right data to
address those deficits, rather than trying to create fake data to compensate.
The other missing element is that we currently do not have particularly systematic ways of
identifying or testing for bias. There are arguments that AI transparency will resolve this, but
transparency can make it easier for malicious actors to fool, or worse, reverse engineer the
underlying AI models. Instead, auditing and certification by trusted parties are important for
ensuring that AI systems meet a high-standard of performance. This relies on there being a way to
define those standards. Some vendors are providing auditing services as part of their product offering to determine if bias is present, but they are also incentivised to claim that there is no bias.
We need independent researchers and government agencies to invest resources into developing
these standards and tests, alongside appropriate regulation and investigative powers to ensure that
these standards are being met. There are predecents of government investigations of technologies,
particularly in safety-critical applications such as aircraft and vehicles. For example, road transport
and highway authorities generally have wide-ranging powers to investigate how car crashes happen
- in the wake of an Uber autonomous vehicle killing a pedestrian, the US National Transportation
Safety Board engaged technical experts to go through the hardware systems and software code to
understand what the sensors in the car saw and how decisions about stopping the vehicle were
made.
The trouble for the people and companies using AI systems is that very few would intend for their
systems to be biased. Perhaps this is a limitation of the technology that we have available to us at
the moment, and bias just exists without anyone putting it there. But there are two key actions that
can mitigate and minimise that bias as much as possible – collecting as much real-world data for
training as possible, and testing/auditing the system widely including exposing the system to edge
cases. These may both seem like costly actions, but it’s worth comparing this to the cost of losing
customers because the system doesn’t work for them. Understanding the cost of errors in the
system is critical to understanding how much effort needs to be put into avoiding them.