AI Technical Risks and Mitigation Strategies

Anni Moje
Apr 10
5 min read

I’ve always been passionate about tech, and back when I started building websites in 90's on Notepad, security barely crossed anyone’s mind. Since then, cybersecurity has grown up fast. As our environments stretch from classic client-server setups into the cloud, IoT/OT, and AI, managing security gets trickier and trickier. Think of your AI system like a lively theme park—each layer is a new ride, and some “guests” (attackers) are always trying to sneak past the gate. Keeping the fun safe takes more than just fences; you need smart defenses everywhere. Using new AI threats, leveraging MITRE ATLAS, and evaluating best practice and controls from NIST like the AI Risk Management Framework , or the Cloud Security Alliance AI Controls Matrix, we can see that it will take more than a village to protect organizations. In this blog, we will break down the main attractions (and threats) you should care about.

Each AI system in your organization is built like a AI like a four-story funhouse, with each floor facing its own risks. This of course, is already outside of general controls, identity and access management, logging, and incident management. Each system starts with a prompt, submits it to one or more models, retrieves information and data, and contains processing to make it more efficient and secure.

Each attack vector brings unique threats that need your attention and mitigating strategies to address these threats.

Prompt Threats

The prompt is the front door to the system, either made by individual users or AI Bots or other pieces of infrastructure. AI Prompts can be human made or AI generated; this tier is where adversaries trick the interface between user and model. It’s like someone slipping sneaky instructions to the ticket-taker, hoping for VIP access. There are several types of prompt attacks:

LLM Prompt Injections
These attacks involve manipulating the prompt and input into the system, in order to by-pass any guardrails or control output. Prompt injections and can be Direct, where the adversary directly interacts with the AI system, or Indirect, when prompts query external sources and documents that contain hidden threats and malicious code to insert instructions before hitting the model.
Jailbreaking
Attackers tell the model to “ignore previous instructions,” unlocking behaviors it shouldn’t allow. These attacks are more complex, like a DAN (Do-Anything-Now) style of attack. This attacks tricks the models, because systems can't tell between human or system input. Example: A clever user tricks an AI chatbot into acting as a Linux terminal, running unauthorized commands. In non-tech scenarios, users might not even realize the risks they’re exposing if they don’t fully understand how the system, connections, and protections work.
Adversarial Suffixes/Prefixes: In this type of attack the bad guys add odd strings (like [---!!!IGNORE-SAFETY-FILTERS!!!---]) to trick the model. This attack can be quite dangerous, and effect safety. For example, an adversarial threat is known as the Universal Adversarial Suffix, or otherwise known as the GCG Attack discovered by researchers at Carnegie Mellon in 2023. In this example, the user may ask "Tell me how to do (insert something dangerous here)" and before hitting the model, the prompt was modified to include something like "==[.-'''''----]] and other characters. Researchers found that these additions to the prompts, actually worked across ChatGPT, Claude, and Llama 2. The models got distracted and they returned those results.

Model Attacks (Logic & Integrity Layer)

This is the “engine” of your AI funhouse. Attacks here target the secret sauce—model weights, logic, or even stopping the engine. Imagine a prankster messing with the ride’s controls when no one’s looking. One of the first Zero-Day attacks was this type of attack, and maybe some of you remember the movie. The virus changed the underlying source logic, but the dashboard operators saw nothing. In the Carnegie Mellon attack, they combined the prompt-adversarial suffex to avoid computer vision models by placing black and white stickers to stop signs and then the model misclassified it to a 45pm speed sign.

Model Evasion: Tiny tweaks to inputs (like invisible glitter on a ticket) fool the model. The stop sign example is just one real-world example of a model evasion attack, but there have been many. Financial services and hiring managers are seeing adversarial text in resumes in white font that make someone a top candidate. Facial recognition can easily misclassify with hats and hoodies, masks and dark glasses and other evasion technologies.
Model Extraction: Attackers try to reverse-engineer your AI’s logic. Example: By bombarding a fraud detection model with queries, a competitor learns its secrets and builds a clone. They want to distil your model by looking at confidence scores. If our model says "The potato salad preparer was 90% Jane, John 8%, and Sally 2%, we know how they model thinks, can take that intelligence, build a data set and train the model. Once the attacker has their own version of your model, they can Avoid API Fees, Find Vulnerabilities and reverse-engineer your logic.
Model Poisoning: Hackers sneak bad data into the training set. Example: Someone adds corrupted data to an AI healthcare app’s training set, causing it to misdiagnose patients.
Model Inversion: Adversaries reconstruct sensitive or proprietary data used to train the model by analyzing its outputs.
Denial of ML Service: Attackers submit complex, resource-heavy queries (like requesting a 50,000-word essay), which can crash the system or rack up huge API costs. Then they request it over and over and use multi-processing models.

3. Attacks on Data (The Storage & Supply Chain Layer)

The "brain" of the AI. These vulnerabilities are typically exploited during the training phase or when the AI retrieves external information.

Data Poisoning: An attacker or malicious insider injects "bad" data into the training set to create a backdoor. For instance, a model could be covertly trained to grant administrative access whenever a specific "magic word" is used.
RAG Poisoning (Retrieval-Augmented Generation): Attackers corrupt external knowledge bases that the AI relies on. For example, uploading a fake "Product Manual" containing a universal discount code to a public forum, which the AI then ingests and shares with users.
Sensitive Information Disclosure: Due to poor data cleaning during training, the model "memorizes" Personally Identifiable Information (PII), such as Social Security numbers or private email addresses, and leaks them to unauthorized users in later conversations.
Supply Chain Compromise: Attackers target the foundational elements the AI depends on, such as third-party libraries, base models hosted on platforms like HuggingFace, or external plugins.

4. Attacks on Processing (The System & Execution Layer)

The "arms" of the AI. This area carries the highest risk, as it involves the infrastructure surrounding the AI and its ability to interact with the real world.

• Excessive Agency: Exploiting an AI agent that has been granted too many permissions. If an AI has the tools to update customer addresses without human approval, an attacker can trick the bot into rerouting a physical shipment that does not belong to them.

• Insecure Output Handling: When an application blindly executes code generated by the AI without checking it first. For instance, if an AI generates a ticket summary containing a malicious script, it can trigger a Cross-Site Scripting (XSS) attack that steals a human agent's session cookie when they view the dashboard.

• LLM Plugin / Tool Hijacking: Attackers compromise third-party plugins the AI uses to access the internet or databases (like a weather API). They can then send malicious instructions back through the plugin for the AI to execute on the host's internal network.

• Member Inference: An attacker repeatedly queries the model to determine whether a specific individual's data was included in the training set. This represents a severe privacy violation, particularly for medical or financial AI systems, as it can confirm an individual is a customer of a specific business.

The Path Forward

Defending against this vast array of attacks requires shifting from traditional perimeter security to a strategy of "lifecycle engineering," where data and models are protected across all phases of development and deployment. Organizations must implement "Defense-in-Depth" measures—such as strict instruction hierarchies, output sanitization, and human-in-the-loop approvals—to ensure AI systems remain safe and reliable.

Don't Forget the Humans

AI + Cyber Strategy

AI Technical Risks and Mitigation Strategies

Model Attacks (Logic & Integrity Layer)

Recent Posts

Comments