Quick Points:
- Paper Title: “Semantic Stealth: Crafting Covert Adversarial Patches for Sentiment Classifiers Using Large Language Models“
- Paper Link: Via google scholar: https://dl.acm.org/doi/pdf/10.1145/3689932.3694758
New research (Nov 2024) is showing that LLMs are just as susceptible to adversarial patterns as facial recognition is. What does this mean? Let’s try to understand.
Adversarial Patterns
An “Adversarial Pattern” is a pattern that is specifically designed to break a targeted Deep Neural Network based system… so an ‘AI’. Some researchers have created “Adversarial Patches” (https://arxiv.org/pdf/1712.09665) that are special patterns that cause image classification systems to break in some predictable way. They did this by printing the pattern out as a sticker and placing it in the camera’s field of view.
This is an active area of research (including myself) for some time: with some very interesting results. This whitepaper from Tencent (https://keenlab.tencent.com/en/whitepapers/Experimental_Security_Research_of_Tesla_Autopilot.pdf) shows some attacks for the Tesla EV — including some interesting perturbation (‘semi-random noise’ pattern) style attacks — for those interested.
Why This is Interesting
We knew that image based systems were essentially Swiss cheese when it came to adversarial patterns (Just head to google scholar and search ‘adversarial patterns’). What is interesting is that this ports a traditionally-vision based attack to a text based LLM. SUPER Interesting.
Remember kids. Under the hood, there’s a Deep Neural Network. Attack not the application, but its foundation.
Continue reading