How do we secure our AI?

We have an interesting problem on our hands. Our reliance on Artificial intelligence has increased exponentially over the last few years and yet our strategies and technologies to secure those systems have not. Couple that with the rise in sophistication of cyber-attacks, the very public attacks on AI run digital and physical assets by state-based actors, and the usage of AI in conflicts, and we may just have the perfect storm for an all-out war between countries.

Perhaps you think it is an exaggeration, or is it? Imagine a sophisticated attack that cripples the AI systems managing a country’s several core infrastructures, bringing the nation to its knees. What will the response be? Patch, restore from backups, and bring back the dead? Or imagine if the Israeli AI-based autonomous drones which attacked Hamas targets are hacked by Iranian-backed groups, and they take out Israeli targets instead. Gartner predicts that in the next three years (by 2024) “a cyberattack will so damage critical infrastructure that a member of the G20 will reciprocate with a declared physical attack”. I know Gartner does not have a crystal ball nor a seat at the White House or the IDF, but at the very least, they are brave enough to publish such a prediction.

What sorts of attacks are possible against our AI models? How do we secure these AI-based assets?

How can AI systems be hacked?

There is a great list of possible AI-based attacks provided by Microsoft, generally speaking however, the attacks can be divided into four categories:

·      Attacks designed to mess with the output of your modelPerturbation and poisoning are two examples of this type. Strictly speaking, these two attacks are different and happen at different stages of an AI’s lifecycle. Perturbation happens after your model is deployed while poisoning attacks the model while it is in the training phase. Both, however, aim to do the same thing; trick the model into producing an incorrect result, be it a label, a classification, an action, etc. This is where a picture of a Stop sign with couple of stickers on it, can be read as a 40Kmph sign by a vehicle, or a dog will be classified as Guacamole.

No alt text provided for this image

·      Attacks designed to steal the data from your training datasetModel inversion and membership inference attacks fit into this category. The simplest way to explain this is to think of the two attacks as data exfiltration or data theft. As per the previous type, the two attacks are subtly different; In Model Inversion, the attacker tries to reconstruct the private training data, whereas in a membership inference style attack, the attacker aims to determine whether a given record was part of the model’s training dataset or not. Both attack types create a privacy nightmare for us.

·      Attacks designed to steal your model. Imagine you have invested heavily in collecting, storing, and analysing data. You have then gone on to spend even more money on training your model and then comes Mr/Mrs Hacker and they take your model (read, your Intellectual Property). Model Stealing is just that; an attacker queries the target model with sample data and uses the model’s responses to create a replica for themselves. Not only they have copied your Intellectual Property at next to no cost, but they can also use this forged model to design perturbation and membership inference type attacks.

·      Attacks designed to repurpose your model. This is rather clever. With the Neural Net Reprogramming style attack, the attackers use specially crafted queries to reprogram your model to do something it was not designed to do. Imagine your Cat/Dog image classifier being used for Facial Recognition.

The above are AI-specific attacks. There are, however, the traditional cyber challenges we need to worry about too (the fifth category?). Backdoor Machine Learning, exploiting software dependencies, and attacking the ML supply chain are several examples where the old and the new worlds have a similar story.

How do we secure our AI?

The bad news is, despite a couple of hundred papers written recently on the topic of securing AI, we do not yet have a complete solution. Our existing security frameworks and products do not adequately address attacks on AI systems. You can’t exactly deploy a Zero Trust based strategy for your AI. Nor can you go to a vendor and ask for an Anti-Perturbation product.

The CIA triad principles do not map nicely onto AI systems either. Microsoft has tried to do this and it is perhaps the best mapping possible. I believe, however, there is the need for a whole new principle and the introduction of the Cyber tetrad; Confidentiality, Integrity, Availability, and Explainability. I can already hear some of you saying let’s have a pentad and break up Explainability into Resilience and Discretion. Explainability, as a principle, I believe covers both Discretion and Resilience, but we are getting way ahead of ourselves. Let’s go back to the original topic at hand; How do we secure our AI?

there is the need for a whole new principle and the introduction of the Cyber tetrad; Confidentiality, Integrity, Availability, and Explainability.

One would hope that at an absolute minimum you have grounded your AI in sound, responsible principles. The 23 ASILOMAR AI principles are an excellent place to start. These high-level guiding principles cover a broad set of topics which for most of us may be beyond the scope of our AI. If this is the case, then I would strongly suggest taking a close look at IBM’sMicrosoft’s, or Google’s Ethical AI principles, all of which cover privacy and security to an extent. There are no NIST or ISO publications for AI Cybersecurity (yet – but NIST will be producing one), so I suggest next you look at Gartner’s Top 5 Priorities for Managing AI Risk and their MOST framework.

Groundwork taken care of, we need to see how our training dataset or our ground-truth data can be protected from attacks. This is perhaps the only AI-specific area where we have reasonably good defence technologies in place. We can restrict access to our dataset using existing methods coupled with strong authentication and Separation of Duty (SoD). It is also possible to use differential privacy or privacy-preserving algorithms for our models.

Taking care of our model and making sure it is not being poisoned or subjected to perturbation-style attacks is a much harder task though. We do not have a good defence mechanism against this style of attack. Yes, we can use adversarial examples to train our model to recognise perturbation attacks, but this is not simple or cheap. We could also use Generative Adversarial Networks to do this, but again, the cost would be prohibitive. There is another possible method to defend against this and perhaps combining the two will give us a degree of protection, but this is hard to do.

From here on, we are on our own. There are few publications and even fewer technologies on how to defend our AIs. Microsoft has taken the lead and published some great ideas. They have also released Counterfit, an AI Security Risk Assessment tool.

What are our options now?

My suggestion is to use some or all of the mitigation techniques suggested by Microsoft, use Counterfit, and then contract an AI Red Team to attack your model. Of course, the other idea is to wait until NIST publishes something, but the “wait and see” strategy has never worked for anybody 🙂