[MCDP] Mashed Calculus and Differential Potatoes

The bias problem is common for all artificial intelligence applications – that these systems appear to work really well for some segments of the population, but really poorly for others. This can be a significant problem because it erodes trust in these systems. Imagine your customer having an interaction with an AI system, and it “doesn’t work” – who will they blame? Will they blame the computer for being wrong, or the company for deploying a product that doesn’t work? Will they be willing to give the system another chance, or will they write it off and try their luck somewhere else?

There are only two key places in an AI system where this bias can originate – the architecture of the AI system, and the training data used by that architecture to form a model of the world. AI architectures are increasingly complex, with millions of nodes and connections, and it’s quite difficult for any human to understand exactly how these architectures work or which parts of the architecture to tweak. So it’s much easier to treat it as a black box, and to just assume that it’s probably doing as well as it can.

That leaves us with the training data, and this is where most people have put their efforts. More data gives you a more comprehensive view of the world, because you are providing more examples for the AI system to learn. Imagine showing a toddler hundreds of pictures of Sphynx cats and telling them “this is a cat” – would they then correctly identify a Norweigian Forest cat as a cat? The groups that have been most successful in AI are also those with the largest amount of data – the big tech companies or governments with millions or bilions of samples available. In the boom of AI-powered computer vision, the big object detection competitions weren’t being won by smart research groups at universities with better or smarter tech – they were being won by Google, Facebook, Microsoft, Alibaba using really similar methods but with access to more data.

However, that data is valuable, so they typically aren’t going to release it publicly for free. It can be costly to acquire more samples of the environment, and there is a limit to how much time and effort companies are willing to spend. The recent controversy around Clearview, who breached the Terms and Conditions of multiple social media sites to collect a billion-sample facial recognition dataset, shows how difficult it is to otherwise legally build a truly large dataset.

There are some proposals that synthetic data (i.e. data that has been artificially constructed based on existing datasets to try and make the dataset bigger) can help solve this problem. For example, if we have a facial recognition system, deepfake technology could be used to help generate a wider variety of faces, with a bit of randomness to help create new data points that previously did not exist. I think the challenge with these proposals is that they are unlikely to eliminate bias – worse, they may further hide the bias by pretending that it isn’t there. This is because the deepfake technology is trained using a dataset as well, and if that underlying dataset is biased, then it becomes encoded in the synthetic data too. Some have argued that you can control how the synthetic data is generated to address specific biases. But if the designer already knows where those biases are, then they should really invest the time and resources into collecting the right data to address those deficits, rather than trying to create fake data to compensate.

The other missing element is that we currently do not have particularly systematic ways of identifying or testing for bias. There are arguments that AI transparency will resolve this, but transparency can make it easier for malicious actors to fool, or worse, reverse engineer the underlying AI models. Instead, auditing and certification by trusted parties are important for ensuring that AI systems meet a high-standard of performance. This relies on there being a way to define those standards. Some vendors are providing auditing services as part of their product offering to determine if bias is present, but they are also incentivised to claim that there is no bias.

We need independent researchers and government agencies to invest resources into developing these standards and tests, alongside appropriate regulation and investigative powers to ensure that these standards are being met. There are predecents of government investigations of technologies, particularly in safety-critical applications such as aircraft and vehicles. For example, road transport and highway authorities generally have wide-ranging powers to investigate how car crashes happen - in the wake of an Uber autonomous vehicle killing a pedestrian, the US National Transportation Safety Board engaged technical experts to go through the hardware systems and software code to understand what the sensors in the car saw and how decisions about stopping the vehicle were made.

The trouble for the people and companies using AI systems is that very few would intend for their systems to be biased. Perhaps this is a limitation of the technology that we have available to us at the moment, and bias just exists without anyone putting it there. But there are two key actions that can mitigate and minimise that bias as much as possible – collecting as much real-world data for training as possible, and testing/auditing the system widely including exposing the system to edge cases. These may both seem like costly actions, but it’s worth comparing this to the cost of losing customers because the system doesn’t work for them. Understanding the cost of errors in the system is critical to understanding how much effort needs to be put into avoiding them.

Mashed Calculus and Differential Potatoes [MCDP]

Friday, 14 February 2020

Can we eliminate bias in AI technologies?