General Science, Programming, Security, Privacy, Technical, Uncategorized

Caleb’s Guide to Getting Started in Attacking AI

You might have heard about them. Maybe you saw a headline or read an article talking about them. Or maybe you overheard a conversation. Regardless of where or how, more people are talking about attacks on AI or Machine Learning. Which I find fantastic!

For me, It’s been almost 10 years. My master’s degree was situated around using machine learning to attack other systems, and now my PhD is focused on attacking the very algorithms that facilitated my previous research.

But what if someone wanted to learn about these attacks? Or even learn some of them? Unfortunately the information around attacking AI is scattered between academia and industry, including papers, random websites, blogs (perhaps I am a contributor to this problem), social media posts, and the like. So I did some research on some of the more pertinent areas to provide the list of resources below.

This is not an exhaustive list by any means (and I plan to update it regularly), but it is a great start for anyone interested.

The list below is broken up into a few categories:

  • Intro and General“: This is for new-comers who are interested in the area but don’t know where to start. It includes some overviews of attacking AI, some more in-depth resources that cover all the bases (but might be more on the ‘verbose’ side), and some articles/topics that don’t fall into the other categories.
  • Practical“: This category includes practical attacks or instructions on AI or Machine Learning (AI/ML). Some of the other categories might have some practical content too, but this is where most of the more applied attacks will go.
  • Assessment“: This category is more about methodology. There is an approach to attacking a system, and the methodology is important. This category will include topics that I find useful in assessing a system. It will include things that may help find “bad code smells”, or red flags, but could also include general advice.
  • Academic“: This category will be less practical but represents the bleeding-edge of research in the area. I will update this area from time to time with sources (not necessarily papers, but sites where you can find your own research) to keep things fresh.

So let’s get started.

Intro and General

A good intro to attacking AI

NIST “Quick” Intro to Attacks and Taxonomy

An Intro to Adversarial Image Attacks

Intro to Prompt Injection Attacks

Practical

PortSwigger Academy: LLM Attacks

  • https://portswigger.net/web-security/llm-attacks
  • Attacking Large Language Models (LLMs)
  • Gives a great introduction to what LLMs are then dives right into showing how they can be attacked
  • Provides labs to practice what you just learned in a safe environment

OWASP Machine Learning Security Top Ten

  • https://owasp.org/www-project-machine-learning-security-top-10/
  • OWASP has a few top tens, and all of them are good. This is their Machine Learning (ML) security top ten.
  • Each item in their top ten includes a description, advice on preventing the given attack, various risk factors, and then example attack scenarios.
  • Although these are not “learn to attack models” in purpose, we can derive useful information about attacks and learn to conduct them ourselves.

Learn Prompt Injection Attacks (With Lab)

Gandalf LLM Pentesting Lab

Assessment

OWASP AI Security and Privacy Guide

  • https://owasp.org/www-project-ai-security-and-privacy-guide/
  • A general rule of thumb when assessing privacy and security adherence is simply to check the standard:
    • If it says “MUST”, check that the system actually does that.
    • If it says “MUST NOT”, check that the system actually does NOT do that.
    • You will be surprised how often a system only partially implements the standard.
  • As an offensive researcher, we are looking for red flags. If the standard says “reduce the access to sensitive information”, and you see that there might be some sensitive information in the training set, that indicates that the standard/guideline was not followed well. Perhaps there are other more glaring issues there too?
  • This is one such guide but there are many others. Read the guidelines. Understand what is expected by them, and look for where a system falls short of those expectations.
  • Caveat: Some items in a standard/guideline may not apply to an attacker. It is up to you to figure out what is applicable and what is not.

Academic

If you are looking for publications, you might want to start here:

Specific Publications of Note:

Standard