Programming, Security, Privacy, Technical, Uncategorized

ChatGPT, DAN, and Turning Up the Heat with Adversarial Machine Learning


ChatGPT was released in November 2022 which means that, as of the writing of this article, it has been out for roughly 3 months — Long enough for the good stuff to start coming out. What do I mean? After the initial “wow” factor hits, for many people the next phase is exploration. This is my favourite part. In the exploration phase, you start saying things like, “I wonder if it can do this…”, or “I bet it can’t do that”. These are the natural “pushing of boundaries” in any new capability.

It should come as no surprise that many ChatGPT prompts were asking it to generate text that one may find “questionable”. The next natural step for an organization to take is to try and limit the technology in some way so that it does not generate such prompts. In ChatGPT’s case, OpenAI added (trained in) a simple ‘ethics’ check, where the system could respond with some form of “I’m sorry, it would be unethical to do that” response. Well now we have an adversarial game in itself! On one side, we have those pushing the new technology’s boundaries (ethical or not), and on the other side, we have the system’s designers and a desire to keep things above board (And out of the “bad” news cycles like Microsoft’s Tay Twitter bot).

This results in the arms race that security practitioners know all too well — and of particular interest to myself as my PhD research is on adversarial Machine Learning (ML) inputs and attacking systems with other “AI” systems.

Continue reading