General, General Science, Security, Privacy, Technical

Adversarial Patterns Starting to Pop Up In LLMs

Quick Points:

New research (Nov 2024) is showing that LLMs are just as susceptible to adversarial patterns as facial recognition is. What does this mean? Let’s try to understand.

Adversarial Patterns

An “Adversarial Pattern” is a pattern that is specifically designed to break a targeted Deep Neural Network based system… so an ‘AI’. Some researchers have created “Adversarial Patches” (https://arxiv.org/pdf/1712.09665) that are special patterns that cause image classification systems to break in some predictable way. They did this by printing the pattern out as a sticker and placing it in the camera’s field of view.

This is an active area of research (including myself) for some time: with some very interesting results. This whitepaper from Tencent (https://keenlab.tencent.com/en/whitepapers/Experimental_Security_Research_of_Tesla_Autopilot.pdf) shows some attacks for the Tesla EV — including some interesting perturbation (‘semi-random noise’ pattern) style attacks — for those interested.

Why This is Interesting

We knew that image based systems were essentially Swiss cheese when it came to adversarial patterns (Just head to google scholar and search ‘adversarial patterns’). What is interesting is that this ports a traditionally-vision based attack to a text based LLM. SUPER Interesting.

Remember kids. Under the hood, there’s a Deep Neural Network. Attack not the application, but its foundation.

Continue reading
Standard