In the rapidly evolving world of AI, achieving software security can feel like a moving target. With the rise of AI applications and their importance for business models, new security challenges are emerging, making it crucial for developers to understand not just how to consider security during development, but also how to assess the actual threats in the specific use case. The established methodology of threat modeling can help to secure AI applications in a preventive, precisely fitting way.
What is “secure-enough AI software“?
Every developer aspires to build good software. However, defining ‘good’ is complex, although it surely also covers software to be ‘secure’. But what does secure software truly mean? In the wake of the recent rapid development in the AI field security issues arise in this field as well: the Toyota leaks of AI training data affecting millions of users, exposure of entire AI environments, or prompt injection attacks on LLMs like ChatGPT, to name a few. It is clear that ’secure‘ means more than just having a robust defense system.
However, it’s important to understand that security is not an absolute state. It is impossible to build a system that is 100 % secure. Security measures come with a price in time or money and can often interfere with other project goals such as usability or the development of certain features. Therefore, the question we should be asking is not “Is my software secure?“ but rather “Is my software secure enough?“: Secure enough for the business case, the environment it runs in, the adversaries we expect?
To answer this question, we need to know what we are defending against and who we are defending against. This knowledge does not come intuitively; it requires a systematic approach to identifying potential threats and developing strategies to counter them. At this point, threat modeling comes into play. This methodology has been around for system and software engineering for decades, and it is also applicable to AI applications.
The attack surface of AI
AI finds many use cases in our everyday lives and work, and it is transforming the way we live and work. It is being used in healthcare to diagnose diseases and develop new treatments. It is powering autonomous driving systems that promise to revolutionize our transportation networks. It is helping businesses make data-driven decisions and providing consumers with personalized recommendations. And, for the last few years, especially: GenAI and LLMs have changed the way data and content can be accessed, processed, and manipulated. The potential use cases for AI are vast and varied.
But with these opportunities come risks. A hacked autonomous car could cause a traffic accident. A manipulated healthcare AI could misdiagnose a patient. A compromised chatbot system could leak sensitive data. Security is not just a nice-to-have in these scenarios – it’s an absolute necessity.
Threat Modeling for AI
Threat modeling provides a structured method for identifying potential threats, categorizing them, and devising countermeasures. One of the most popular methods is the “Four Question Framework“ of Adam Shostack, which asks four questions.
These can also be used to perform threat modeling for AI. With this structured approach, development teams can examine the security of its system in a preventive and tailored way.
Consider a Retrieval-Augmented Generation (RAG) application built with Haystack as an example. Here’s how we might apply the Four Question Framework for threat modeling an AI system:
Step 1: What are we building?
We are building a RAG system that, containing company data as a knowledge source, is used for a question-answering system for a public audience. This step includes further considerations about the users and their roles, the assets that need protection, and the environment of the system.
When working on this step of threat modeling for AI, it usually helps to use vizualisation: Either reuse existing architecture diagrams or quickly sketch them in the step. Not only is a visual representation often more compact, but it also helps to communicate about the threats for the different parts of the application later.
Step 2: What can go wrong?
Secondly, we put ourselves in the perspective of an attacker: How could the assets in our AI application be harmed? What threats exist for the individual components of our system? And how could this affect the entire system?
Several threats could potentially arise in the RAG scenario:
- Data Poisoning: Malicious actors could manipulate the corpus of text from which the RAG application retrieves information, leading to incorrect or harmful responses.
- Adversarial Attacks: Attackers could craft questions designed to trick the model into generating inappropriate or harmful responses.
- Data Leaks: If the corpus of text includes sensitive or private information, the RAG application might inadvertently expose this information in its responses. Also, the training data itself needs to be protected and should not be accessible.
Other AI systems may face different threats. When developing and operating own models, threats like Model theft or Model inversion get more relevant. Depending on the origin of the used data, different poisoning or backdoor attacks have to be considered.
This step might be the most difficult, and with the constant expansion of AI technologies, the threat landscape evolves as well. Luckily, there are comprehensive collections of threat vectors to be considered. One great compilation is the OWASP AI exchange: The site collects threats against different types of AI systems and also states matching mitigations (see next step). This makes it a great help when diving deeper into threat modeling for AI applications.
In general, threats can be separated into development-time threats and runtime threats. AI components like an LLM mostly do not run stand-alone but are embedded into other applications and systems. Securing these systems and especially the interfaces between them and the AI components is crucial and requires knowledge of both worlds.
Step 3: What are we going to do about it?
The third step when performing threat modeling for AI is to treat the threats that were found in the previous step. In classical AI projects, the mitigations cover activities from data collection and processing to model development to the actual use of the model in the runtime environment. Some are AI-specific and need a deep understanding of the training process and the resulting mode, some are well-known from classical software engineering and can be adapted to AI applications.
To mitigate the threats to our RAG example, we could take several steps:
- Data Sanitization: Regularly review and clean the corpus of text to ensure it does not contain harmful or sensitive information. If this is not possible for the entire amount of data, the use of representative test cases of question-answer-pairs might help. In other, more deterministic AI applications, a “golden data set“ as a reference is helpful.
- Robust Design: Design the model to handle adversarial inputs gracefully, perhaps by incorporating techniques to recognize and reject or distort such inputs. When training or fine-tuning your models, they should also be trained on adversarial examples.
- Securing the Supply Chain: All used components that are not self-implemented need to be considered as part of the supply chain. This includes securing the sources of data used to train the AI models (data supply chain), ensuring the integrity of the AI algorithms and models themselves, and protecting the systems and platforms used to deploy and operate the AI models.
- Secure operations: The access to the RAG needs to be authenticated and the use of its features needs to be controlled by roles and permissions. This also covers controlling access to the different test and production environments.
For each threat of step 2, at least one matching mitigation needs to be present. Multiple mitigations are imaginable and, according to the principle of defense in depth, can increase security further. These mitigations may be technical, but you can also choose processual measures. In the end, not every risk can be covered completely, so accepting a risk is also a possible option – if you understand the risk well enough to argue why you can take it.
Step 4: Did we do a good enough job?
This last step serves as a possibility to retrospect the process and, more importantly, its outcome. Did we improve the security? Or, just as important: Did we get a better understanding of the threat situation and our security measures? Performing threat modeling for AI is always also about communication, discussion, and knowledge sharing.
To evaluate our specific mitigation strategies, we could also conduct regular audits and penetration testing. We could also monitor user feedback and the model’s responses to continuously assess and improve its performance and security.
Conclusion
These four steps are all that are needed for a first threat modeling session. Nevertheless, threat modeling is not a one-time activity. The system might be extended over time, but especially the threat landscape does evolve. And as new threats emerge, threat modeling should be done repeatedly and continuously.
Security is not a state, but a process. And while we cannot achieve 100 % security, threat modeling for AI provides a valuable tool for building AI applications that are ‘secure enough’. It also gives the development teams a certain peace of mind, as it shifts the security approach from reactive measures and single best-effort activities to a preventive, structured, and comprehensive methodology.