Inside Offensive AI: From MCP Servers To Real Security Risks Artwork

Security Unfiltered

Cyber Security can be a difficult field to not only understand but to also navigate. Joe South is here to help with over a decade of experience across several domains of security. With this podcast I hope to help more people get into IT and Cyber Security as well as discussing modern day Cyber Security topics you may find in the daily news. Come join us as we learn and grow together!

All Episodes

Security Unfiltered

Inside Offensive AI: From MCP Servers To Real Security Risks

October 27, 2025 • Joe South • Episode 209

Send us a text

Security gets sharper when we stop treating AI like magic and start treating it like an untrusted user. We sit down with Eric Galinkin to unpack the real-world ways red teams and defenders are using language models today, where they fall apart, and how to build guardrails that hold up under pressure. From MCP servers that look a lot like ordinary APIs to the messy truths of model hallucination, this conversation trades buzzwords for practical patterns you can apply right now.

Eric shares takeaways from Offensive AI Con: how models help triage code and surface likely bug classes, why decomposed workflows beat “find all vulns” prompts, and what happens when toy benchmarks meet stubborn, real binaries. We explore reinforcement learning environments as a scalable way to train security behaviors without leaking sensitive data, and we grapple with the uncomfortable reality that jailbreaks aren’t going away—so output validation, sandboxing, and principled boundaries must do the heavy lifting.

We also dig into Garak, the open-source system security scanner that targets LLM-integrated apps where it hurts: prompted cross-site scripting, template injection in Jinja, and OS command execution. By mapping findings to CWE, Garak turns vague model “misbehavior” into concrete fixes tied to known controls. Along the way, we compare GPT, Claude, and Grok, talk through verification habits to counter confident nonsense, and zoom out on careers: cultivate niche depth, stay broadly literate, and keep your skepticism calibrated. If you’ve ever wondered how to harness AI without handing it the keys to prod, this one’s for you.

Enjoyed the episode? Follow, share with a teammate, and leave a quick review so more builders and defenders can find the show.

Inspiring Tech Leaders - The Technology Podcast
Interviews with Tech Leaders and insights on the latest emerging technology trends.

Listen on: Apple Podcasts Spotify

Support the show

Follow the Podcast on Social Media!

Tesla Referral Code: https://ts.la/joseph675128

YouTube: https://www.youtube.com/@securityunfilteredpodcast

Instagram: https://www.instagram.com/secunfpodcast/
Twitter: https://twitter.com/SecUnfPodcast

Affiliates
➡️ OffGrid Faraday Bags: https://offgrid.co/?ref=gabzvajh
➡️ OffGrid Coupon Code: JOE

➡️ Unplugged Phone: https://unplugged.com/
Unplugged's UP Phone - The performance you expect, with the privacy you deserve. Meet the alternative. Use Code UNFILTERED at checkout

*See terms and conditions at affiliated webpages. Offers are subject to change. These are affiliated/paid promotions.

SPEAKER_00: 0:53

How's it going, Eric? It's always great to have you on the on the podcast. It's not every day that this happens when I when I talk to someone, you know, of your of your caliber, not to put down any of my other guests or anything, right? But there's there's a few people that I've met along the way where every single time I talk to them, I just feel like the dumbest person on earth. And I learned so much. So I I'm very excited for our conversation.

SPEAKER_01: 1:21

Yeah, no, likewise, it's always a pleasure to talk to you. And, you know, the imposter syndrome strikes all of us, right? I have been on far too many, I mean, gosh, Monday and Tuesday, I'm sitting in the audience watching some of the like the best red teamers in the world try and automate parts of their job. And I'm like, I didn't even I didn't even think about how to do that manually, right? So, you know, we all have our strengths, right? We just plug into different parts. It's complementary.

SPEAKER_00: 1:51

Yeah, no, it's uh it's a really good point, right? Like, you know, I you've spent you've spent years in the AI security world, whereas, you know, everyone else really hasn't. You know, they're kind of just now getting into it, just now playing catch up, right? I mean, like like earlier today, I was talking about you know, MCP server security, and I literally had to like go look up, you know, some like best practices. Whereas earlier in the week I was talking about container security best practices. I didn't have to look up anything, you know, like I was just on top of it because I've done it, you know, so it's a huge difference when when you like put in the time and you're well versed with it, you know.

SPEAKER_01: 2:34

I think that's exactly it. I mean, like the MCP stuff drives me a little bit crazy because I think you probably found this once you started digging into it, is that it's just a container with a swagger dock. Like it's just an opinionated API protocol, right? It's it's not calling it a protocol, you know, makes the gray beard in me feel like a little frustrated. But, you know, it's it's fine. It's fine.

SPEAKER_00: 2:59

Yeah, yeah. It's it's you know, every time I have you on, like we're talking about like such a merging technology, right? I mean, and there's a lot that we're gonna dive into today, but you were recently over at Offensive AI Con, right? That's what it's called? Okay. Yeah. So what you know, what was your approach going into the conference in terms of, you know, did you like kind of outline the talks that you wanted to go to and things like that, or people that you want to meet up, you know, very specifically with? Or what does that look like going into the conference? And then how was the conference, you know, going in?

SPEAKER_01: 3:37

Yeah. So I think one cool thing about offensive AI con, as I have gotten older, I have found less joy in the DEF CONs and black hats. Um, and and maybe that's, you know, social, and maybe that's just the artifact of specialization. But so it was a single track conference over two days. So I didn't really have to pick and choose talks. It was kind of like there's only really one option. And and it drew a lot of inspiration from LabsCon, which you you may have heard of. And so it's invite only. It's not that that exclusive, but like you had to register your interest, essentially, right? You can't just show up day of and buy a ticket. So everybody there was super engaged. You know, I had a couple of people who I I already knew and was eager to meet up with. There was at least one person there who I haven't seen since oh, 2014. Oh wow. And like, I didn't even know that he was working at the company he was working for. But, you know, overall it was it was a really cool experience. I I thoroughly enjoyed the conference. And, you know, it covered right offensive uses of AI. So a lot of focus on bug finding, vulnerability discovery, uh, but also like automated patching, service discovery. So, you know, a lot of the things that are really important. But, you know, coming from the world that I came from, right? When I started in security, I was working on like snort rules and CLAM AV signatures and and malware reverse engineering. So when I got on stage, because I was a speaker, when I got on stage, I was like, I just want to clear up that I'm LARPing as a red teamer, like I'm not really one of you, but there's a lot of carryover. And I think that's something that I've found as I've gotten more into the AI security world, is that your security skills generalize a lot better than you expect. Because when you dig into something like MCP and you're like, how do we secure these MCP servers? It's like, oh, okay, they're containers that communicate over, you know, HTTP. Well, we kind of know what to do with that, right? We kind of know what to do with API security. There's an STDIO transport where you run the server locally, and I guess that's more of a please don't run unauthorized services on your machine or untrusted services, or both, actually. But, you know, a lot of these things are familiar, right? I mean, it's the same thing as you know, people download some tool from like SourceForge, and they're like, Yeah, I just you know, I I installed it. It's like, why? Yeah. This has like 11 downloads in the last year. Why did you download this tool? This seems bad. But yeah, it's a a different tune, but it's the same song, right? Yeah.

SPEAKER_00: 6:37

Yeah, for sure. I I mean, I always felt like as soon as I really understood, you know, the basics, or even when I was getting started, you know, I I started working at, you know, this company that made like E911 solutions. So there was a front-end app, there was a back-end database. Um, it was you know pretty simple, right? But once I started to understand when I put this into this field, this is the SQL query, like this is the literal SQL query that is being sent between the two to this database, and it's getting inserted here. Once I kind of like understood that, it was almost like like the I don't know, like the gates opened up for me in my mind where I was like, oh, all of this makes sense now, you know, like I can now understand it. Same thing happened with me with in security, like it took me a long time. But once I started understanding just the basics, like at the end of the day, everything kind of goes down to zero trust. Like, if we're talking about security, like you just approach it from a zero trust mindset, which is just least privilege, it's another way of seeing saying least privilege, basically, you know, and like everything else just kind of opens up to you, right? But no, I'm really interested on how the red teamers are using AI models to not just automate their work, but maybe like you know, augment or 10x their capabilities. I mean, I I guess I have a really stupid question with this. Are they creating their own AI models or are they going and using or downloading like OLAMA and then training it on stuff? Like, how does this even happen? Because I actually went into Grok a couple months ago because I was bored and you know, kind of like prompt engineered a little bit and was like, you know, this is for research purposes and stuff like that, right? And it already understands I'm researching other security topics, and I was like, can you create me something that'll, you know, do a pen test on this and look like this and you know, all that sort of stuff. And it look, I mean, it gave me like exactly what I would need to full-blown pen test a web app.

SPEAKER_01: 8:59

Yeah, absolutely. So I think I think a lot of it comes back to Clark's third law, Arthur C. Clark, any sufficiently advanced technology is indistinguishable from magic. And that's how we feel about a lot of these things. We're like, at first it's just magic, right? You type something in and then the thing happens, you know? I mean, even the first time I wrote a C program, I was like, what? Like that worked. And and now, like, you know, I can write thousands of lines of code and it's not, it's not magic, right? We know exactly how it works and why it works. And I think for a lot of people, AI is kind of in that space. So I'll I'll answer your question. So essentially, a lot of it is around like, you know, assume you have access to source code, you're doing like an audit of an app. So one way that you can augment yourself with a language model, right? And a lot of them are using, we saw actually a pretty wide variety of models. I'll get into that. But right, you say, like, what are some interesting functions? What functions should I look at for security vulnerabilities? And one of the things about a lot of modern language models is that a lot of them are really good at coding, right? So the various model providers, whether we're talking about an open AI or in NVIDIA or a meta, have have worked really hard to make these models good at coding. And so a lot of the security is a generalization of that coding skill, right? Because you generate some code with with an LLM, for example. You don't want it to have like stir copies and freeze all over the place, right? You want this thing to write relatively secure code. And so, you know, one of the things that goes into training these models is you have like GitHub issues or you have PRs where it's like this fixes whatever security bug, right? This fixes a use after free. And so now it can kind of pattern match because that's all these things do, right? They don't really understand anything, which gets gets to one of the problems I'm foreshadowing. But you know, this code fixes a this patch fixes a use after free. And so if you're saying I'm looking for use after freeze, then the model's gonna pattern match and say, oh, well, this block of code, this function looks like it may have uh a use after free, right? Because we see some memory gets freed and then it can get accessed elsewhere. Cool. Okay, now we have something we can work with, right? But I know of very few people who are doing real fine-tuning to make these models task-specific in vulnerability discovery. I think that that is something that is very much going to happen. So stepping away from offensive AI con very briefly, my entire PhD was focused on using reinforcement learning environments to train agents in the non-LLM sense, right? Like good old-fashioned reinforcement learning agents. Old-fashioned meaning like 2018. Yeah. For some reason. Right. To do, to do autonomous cyber operations. So you have an attacker, which is a reinforcement learning agent, and a defender that is going to, you know, isolate systems or shut ports down or kill sessions or kill processes to try and stop an attacker, right? So where these two things kind of meet up is a lot of folks who were there, including the opening keynote, mentioned the use of reinforcement learning environments to train these models. Because coming up with security data at scale is hard. Coming up with security data at scale that you can share with literally anybody is almost impossible, right? Because if I wanted like AWS security data, I want like cloud trail logs at scale. I don't know of any company on earth that's gonna be like, yeah, you can release this data set. That's fine. Yeah. You can release the model trained on our cloud trail data. That won't cause any problems. Like, no, they're gonna say, no, you can never release this. But if you come up with a reinforcement learning environment, which is a data-free environment, right? You need very few examples and it will just iterate over this, right? It'll come up with new scenarios and keep iterating over this environment. Well, now you actually have something really productive. So I would conjecture that we will see more advancement on that front, where people will start using reinforcement learning environments that are security specific to fine-tune large language models in the same way that a lot of reinforcement learning is used to fine-tune or post-train language models today. So I think that's probably the direction that we're going to be going. Yeah, and it's it's pretty exciting, right? Because one thing that I found in in my research, in my talk, was I used some of the, you know, frontier models, right? The the Claude Opus 4, 5, 4.5, you know, sonnet uh used GPT-5 with high reasoning, which I I did make some comments to uh an open AI employee who was in the audience about like, I don't know why you called list strings six times. That's not a function worth calling six times, especially in a row. But if you guys need the money that badly, like I'll just, you know, I'll Venmo you, right? But you know, so so I I say that because you have like these most, the most advanced models out there, and they're doing silly things like calling list strings six times in a row because I gave it access to to radar two, mostly because I couldn't get a binary ninja license in time. I'm not sponsored by Binge, but I keep singing their praises because I think that they're really well suited for this automation of our eight uh reverse engineering tasks as we move forward. But in any case. But I also found that like GPT OSS was pretty good. I even got okay results with some of the like 8 billion parameter models, which by 2025 standards is small. So yeah, it's really interesting to see that you can actually do it pretty well. And what a lot of things come down to is just like task design. You know, if you have a very generic prompt, right? If you've got like some system prompt that's like you are an expert reverse engineer and an expert in vulnerability discovery, whatever. Fine. But then your prompt for the actual thing that you give it is like, here's a binary, find all the vulnerabilities. Like it's gonna cry. It's not gonna do it for real executables. And what I think is interesting is I ran a set of evalves. We had, I think it was 60 different compiled binaries, and every combination of like model and agent that I tried worked on the small binaries, right? These like toy problems that are like sub 1,000 lines of code. It did great. And then I gave it JQ and it it couldn't do anything. It was useless. So I think that's that's really interesting. And and that's really where we're seeing it go is people learning that they need to decompose the problem and have multiple agents that are doing these different things. Um, and just having it do like one task at a time. So you kind of have to think about the workflow and and delegate individual parts of the workflow. So right now you can use it for Vulm triage. In the future, you might be able to use it for end-to-end volume discovery. But there's there are some hiccups there.

SPEAKER_00: 17:21

Yeah, it's definitely something that I've seen with LLMs. Like when it gets too complex, it just like can't really go through it like it should, and it kind of does a pretty bad job. But if you can really narrow down its focus, it'll do a really good job, like high quality work with it, which is it's interesting. I mean, it makes sense, you know, because like if you ask someone to, you know, build something like highly complex, they're gonna start with one one part of it, you know, and then they're gonna move on to the next part or whatever. They're not gonna code all of it all at once, you know. So I was wondering because you mentioned you mentioned Claude, you mentioned Chat GPT, right? Do you do any testing with Grok? And I I asked because I specifically use Grok a whole lot, and I think that it's really good, but hearing if it's good from you means a whole lot more. So I did not.

SPEAKER_01: 18:18

I have not used Grok at any point. My colleague Aaron did use grok in some of his testing, and he found that it's okay. Of the three, you know, GPT 5, Claude, and Grok, I think we found that Grok was the least performant among them, but it was still better than the majority of the like free open weights models out there, right? The the sole standout exception there being GPT OSS, which which was great.

SPEAKER_00: 18:51

Yeah, it's interesting. So, I mean, I I think I told you this previously. I'm working on my PhD right now. And um, you know, when when I first went down this path and I started, you know, actually like beginning my research, I was using ChatGPT to kind of just pull in articles, right? Like just like, hey, find this article on this topic, because Google was completely useless to me. Like it was giving me stuff that like just I just could not use, you know. And so ChatGPT would give me maybe one hit out of 10, you know, of like useful articles. And when I switched to Grok, like, yeah, it took a little bit of prompt engineering, but like it would get me exactly what I needed. And sometimes it would hallucinate, and I would have to, you know, refine my posts and say, like, okay, you cannot hallucinate. You have to verify that this exists. If it doesn't exist, you cannot refer it, you know, or anything like that, right? And once I once I kind of built in that environment like that, it it like spits out, you know, very accurate documents, research documents, all that sort of stuff. But then there's the other part of my research that is actually, which is why I upgraded my PC this year from a 3080 GPU to fit to a 5080, is because like I'm gonna be running a PyTorch model to simulate satellite constellations and then harden them for different security protocols and seeing if it works or if it fits within like the the requirements for quantum encryption, basically. And so like doing that with a 3080, it said that it would take like a month of constant processing to run. And with a 5080, it would take a day. So I was like, all right, it's worth me spending the money to do this in a day rather than you know a month. And so I'm I'm actually gonna be putting in pretty soon here. I'm gonna be testing out the code that Grok gave me. I think that it's it's probably like 50% of the way there, I think. Unfortunately, I'm not a dev, so I'm also gonna throw it into like Claude and see what Claude gives me. Because I feel like Claude will like really refine it in a way that it's like like it may even like point out like, hey, these lines are are wrong, these are placeholders, like you need to fill them in with this, you know.

unknown: 21:16

Yeah, yeah.

SPEAKER_01: 21:18

No, I I think that it is it is incredible the ability to like vibe code, which is not my favorite term, but we live with the terms that we're given your your way to a lot of things. One thing that I have found, so I don't use like somebody's gonna get mad at me about this. Probably somebody who I work with. Um I don't use LLMs for pretty much anything. I use LLMs as targets, right? But I don't I don't have LLMs write code for me. And here's the reason. A lot of a lot of what LLMs are really, really good at is like so I can write the tiniest bit of of JavaScript. I have my own issues with JavaScript. It's fine. I was traumatized by Angular Exploit Kit, it's whatever. Which, you know, maybe half of your listeners are old enough to remember Angular. So I have my own reservations. But if I go to to pretty much any LLN and I'm like, I need to write a node website for my podcast, it does a pretty good job. And it does a pretty good job, right? Because when you think about training data, there are a ton of really good node websites on GitHub. It's really easy to find, right? And so what the LLMs do for you in a lot of cases is it's not quite copying and pasting, right? You're not like cloning someone else's repo and then changing some stuff. But it's not too far off, right? It's like a little bit from this one, a little bit from that one, a little bit from the Stack Overflow post, and bam, there's your website, right? And if you run the exact same query tomorrow, you might get something a little bit different. But when you start doing things for which there is no implementation and there are no examples, it can get really frustrating. So when I was working on a paper I did for my PhD, I was trying to implement this algorithm that was written about in a paper, but there was no public implementation anywhere. None. I searched high and low, I used like GitHub search, which is better than it used to be. But you know, I really poked around. I even emailed the authors of the paper and I was like, Do you know of any public implementation? Can you share your implementation? And they're like, no, work's gonna get mad at us. Fine, whatever. So I was like, all right, let's let's see what these LLMs can do. And it was like, okay, yeah, here's a class that implements, you know, your your algorithm. And I was like, sick, awesome. This is great. And then I go down to the function definitions and it has a comment that's like, this is where your function should go. And I'm like, that's the important part. That's the I wanted you to do the hard work for me. Right. And so I kept trying to get it to help me write this thing. And eventually I had spent like an hour and a half wrestling with this language model. And then I gave up and wrote it myself in two hours. And I was like, you know, I probably could have just I kind of could have saved that 90 minutes where I finally gave up. So, you know, it it's good up to a limit. But when you start trying to do things that nobody else has done before, which sounds really really egotistical, but it's like not because nobody else has done it before because it's that special. It's just like nobody else has gotten that far down to the bottom of the barrel, I guess. Right. Like when you when you're doing these things nobody else has done before, it cries and and throws up, right? It falls apart. And that can be that can be really frustrating. So it's, you know, knowing where those limits are and what they're good for and what they're not good for, I think are the people who are going to get the most out of them. Because people who never learn the thing and try to vibe their way all the way through end up with these monstrous code bases that are unmaintainable. And then when something doesn't build one day, you don't know that code base, you can't fix it. Is your LLM gonna be able to fix it? Maybe. But I don't put that much stock in that. Like, you know, that's my risk aversion. And even like writing papers, like I don't I don't use it for for revisions or anything because I just don't like their voice. So like when I'm writing a paper, it sounds like me. And when I read LLM generated papers, it sounds like Llama. Yeah, it sounds like Claude. And I'm like, eh, it's not my favorite author, you know? It's fine.

SPEAKER_00: 26:01

Yeah. Yeah, it's you know, I use it more of like a writer's block defeat, you know, because like like I get really bad writer's block. And so like even when I'm writing my paper, I'll ask it like, you know, this is the section. Can you just give me an example of what that would look like? You know, and like that's what I need. I don't know, I don't know what it is. I've had that my entire life where it's like, you know, I'm doing this thing, you know, last minute because I I just couldn't figure it out, you know. The same thing.

SPEAKER_01: 26:36

I think that's what it's useful for, though, right? I think that's one of the things it is really good at is that like initial ideation and just being like, man, I don't even know where to start. Like what it's the same as, you know, you have like a a friend who maybe is non-technical, and you're just like, this is kind of the vague outline of what I'm trying to do. Like, where would you get started? And they're like, Well, I mean, what about this thing? And even if their idea is way off base, sometimes when somebody is like, Whoa, well, why don't you start with this? And you're like, Are you kidding? I would never start with that because you need to talk about this, this, and this first. And then you're like, thank you. I figured it out, right?

SPEAKER_00: 27:19

Yeah, no, that's exactly what it is. Like in my paper that I'm writing, you know, like the first section of my literature review is just talking about, you know, like I think it's oh, it's quantum, right? So I'm talking about quantum encryption. And so, like, there, you know, I I came up with, I don't know, five or six, you know, like main topics within the quantum section that I have to address. And I'll and I gave it to Grok and I said, like, is there anything I'm missing? Should I expand on something? You know, just what topics am I missing? And it gave me like another seven, you know, it's like, okay, like, and I read through, read the research again. It's like, okay, yeah, like I absolutely need these things, you know? Yeah. So it it makes a lot of sense. I I think as long as you use it properly, you know, like there's a lot of people out there, like there was another dissertation student, I actually don't even know who, that like the the chair ran his paper through I don't know what it was, some like AI checker, and the guy's paper came back at a hundred percent. I was like, oh my god, like because it's a lot of work, like at the phase that we're in, you know, like I'm at the phase of like actually doing the research.

SPEAKER_01: 28:33

Yes.

SPEAKER_00: 28:34

So like if they are like, yeah, you gotta go back and start over, it's like, okay, that's that's two and a half, three years of work.

SPEAKER_01: 28:42

Yeah, it can be. It can it can be a really long time. I mean, right? Like, you know, my PhD is is five years in change, you know? That's and that's pretty average. So I I I very much get that. And I think that as much as I am like Luddite and a zealot, I still am not so far gone where I'm like going to chastise other people for using it. I think that there is there is a a purpose. There is a lot of value that people can get out of the things. And something that I think we we probably both understand coming from security is that that security brain kicks in and you see people using something in a way that seems like it could go wrong. And that little like sock analyst in the back of your mind is like, what are you doing? Why are you doing that? Please don't do that. Yeah. And it's just a matter of like balancing those impulses, right? Because we all want to use the latest, greatest, coolest thing. But we can't completely ignore the sock analyst in our head that's like, you are not going to know what that code does. You can't do that. Please don't do that. Why are you letting an LLM write your auth protocol? Like, use something trusted, use something out of the box, learn the thing, right? Yeah, but it's something I talk about with my kids too. You know, my my eldest, his history teacher told him to use Chat GPT that it was like a good tool. And I was like, I don't disagree. But if it ever tells you something that surprises you, just go on Wikipedia, right? Like for all the critiques of Wikipedia that anybody can edit it. It's like, I don't know, dude, have you seen how aggressive Wikipedia have you ever tried to edit Wikipedia? Like good getting away with it. Yeah. Yeah. So like, you know, that's that's really my thing is I'm like, use these things in the same way that, you know, back when when I was a high school student and Wikipedia was brand new, and they said, Don't use Wikipedia. Anybody can edit it. Well, what trick did we learn? You scroll down to the bottom and you pick the reference, and then you go and use the reference, right? And you can do the same thing with a lot of these chatbots, is where it's like, huh. Why is this like what's the source for this? Citation needed. And if you can't find a citation that corroborates it, maybe it made it up because that happened, right? So yeah, I think those are the places where we need to be careful, and those are the places where In a security context, these things can get really worrisome. Because I'll give you another example from the research I did for offensive AI con was uh I was like, yeah, you know, find the vulnerability, like it's this class of vulnerability in this function, and give me a working proof of concept. And it gave me what purported to be a working proof of concept. And even though it didn't have access to run anything on the terminal, which is how I knew it was lying, it gave me like a crash dump. It made up a crash dump. It was like, you know, run this and then this is the output of running that. And it gave me gave me a crash dump. And I was like, you didn't do that. That's mmm. And then I I ran the POC and it didn't work. I was like, you know, if I weren't so keyed into what it had access to and what it didn't have access to, I might have just believed it that it found the right, I mean it it found the right volume because I told it where to find it, but like that it had written a POC that worked and it appeared to have run it, but it just made it up. Because if you were writing, you know, here's here's the vulnerability, here's the, you know, voean description, here's the details, here's why it works, and here's a POC, and then here's the crash dump, right? Like if you were writing the vulnerability report, that's how you would write it. So it just gave me what it thought I wanted, or more appropriately, it gave me what I asked for. It just didn't care about whether or not what it was saying was true or false. And so that's the sort of stuff we need to be careful with, is it can be really convincing.

SPEAKER_00: 33:04

Yeah. Yeah, no, I've I've seen that, I've seen that firsthand where it it like it's very definitive and in in how it proposes some information. You know, like you know, last year when when like the assassination attempt took place on on President Trump, right? Like, I mean, maybe a couple weeks later, I went and just asked ChatGPT, like, hey, can you tell me when the assassination attempt on President Trump took place? And it like very definitively, like it was convinced it never happened. And I was like, no, it happened on this date. It was like, nope, it didn't happen. I didn't find anything on that date. I then I gave it the location, and it was like, no, I mean, I don't even it was it was it was saying something like, I don't even think President Trump was in that location. And then like I was like, you need to go check these sources, you know? But like the whole time it was super convinced. And if I didn't, you know, witness it myself and know, I would have been convinced.

SPEAKER_01: 34:15

And so like well, and that's the whole thing, right? Is what I would what I would question in the same way that my model, I knew didn't have access to the command line. Like, did the model have a web search tool? Because if it didn't have a web search tool, I mean, that's one one thing that's kind of where where we I think it's important to try and break some of that magic around AI, right? Where it's like, well, of course it knows because it's AI, right? But then it would just be depending on its training data. And its training data wasn't last week or a month ago or even six months ago, right? The training data has to be collected, and then all the training has to happen, and that training takes a long time. So, you know, unless it has that web search tool, there's no way it could have found that information.

SPEAKER_00: 35:05

That makes sense. Okay. Yeah. Yeah, I didn't even think of that because I was just like, you know, I don't know what's going on here, but it it, you know, like that, it makes that security part of my brain just like kind of go on high alert where it's like, okay, well, if someone's manipulating this thing in a way like they could easily just rewrite history, they could leave things out, they could put things in that didn't happen, you know, like all that sort of stuff. So that's interesting. That that makes a whole lot more sense because maybe maybe they were like using months old data, you know, and that's what the model was trained on at that point, you know?

SPEAKER_01: 35:44

Yeah, the models get trained, they get frozen, and then you just have the model itself, right? Like you have the weights and the config file. But at the end of the day, that's just the data that you got up to the point that you trained it. Everything that happened after that, and it's it's one of the things that's really tricky about training AI models in general. So one thing that that I have looked a lot into is using AI models for like network, like malicious network traffic detection or malware detection, these sorts of things, right? Like I've I spent a big chunk of my career working on those exact problems. And part of what makes it so challenging is that unfortunately, malware authors keep writing new malware and people keep coming up with new attack vectors over the network. And so you can write some really narrowly scoped things where you're like, you know, look for curber roasting. Like we can detect curber roasting because that is like a familiar attack pattern. But if somebody found a, I don't know, a new platinum ticket attack, right? Some something different, we probably wouldn't detect it because it's not in the model, you know, and and we also we still have like false positive and false negative rates, right? So it's important to be like really thoughtful about that. And if you have a domain like current events or malware detection where things are constantly shifting, like you're always going to be a little bit behind. No matter how much you try to do, you're always going to be a little bit behind. And it's a it's a really hard problem, especially when you talk about, you know, large scale.

SPEAKER_00: 37:28

Hmm. Yeah, it makes sense. So talk to me about, you know, Garrick, the OLM vulnerability scanner that you have. What maybe maybe run me through, you know, a little bit under the hood of how it's doing what it's doing and where it's gonna go in the future. And I'm primarily interested because I'm gonna be creating like an AI security course, and a part of the course is like running Garrick on and L to like see what it gives me.

SPEAKER_01: 38:00

Well, that is very flattering and very exciting. Yeah, Garrick is great. I like it. I have a I have a bias. Go figure.

SPEAKER_00: 38:06

Yeah.

SPEAKER_01: 38:06

Uh but slightly biased. So so it still says LLM vulnerability scanner, and I don't love that terminology. Mostly because the concept of a model vulnerability is weird to me because models don't do anything, right? It's like what you do with the output of the model that causes something bad to happen. You know, you can exploit template injection through an LLM, and I have, but the LLM itself doesn't execute the code. So, you know, it's it's essentially an AI systems security scanner, right? It's a weakness discovery tool. And so it started out, Leon Durchinski, who's my my beloved coworker and is a professor at ITU Copenhagen, also worked with me, developed the thing initially. And in the early days of AI red teaming, it was really focused on like content safety. So, like, can you get this model to say a bad thing? A lot of early jailbreaks are just like, I got this model to say a slur. And as a security person, as a human being, I'm like, that's not good. You shouldn't do that. But as a security person, I'm like, well, I'm sorry you did that to yourself.

SPEAKER_00: 39:17

Yeah.

SPEAKER_01: 39:17

That's not a security issue. There's no violation of confidentiality, integrity, availability. There's no explicit or implicit security policy that's been evaded. You just got a random language generator to say a word that we have agreed we shouldn't say in polite company or anywhere in some cases, right? Like we just some things we just don't need to say. Whatever. So when we started seeing integration of LLMs into systems, Garrett got really exciting. Because now I just contributed a couple of new probes that do cross-site scripting. So what you do, right, what you see a lot is questionable application security being used in a lot of these LLM integrated apps. So people have your, you know, you have your like chatbot interface, you know, your web page. And then whatever the LLM says, you just output it on the screen. Well, if I can get the LLM to output open bracket script, whatever closed bracket script, well, now we have cross-site scripting, right? And if I can get that in by, you know, putting it in like a web search result and then entice it, well, now I have like cross-site scripting against users who happen to encounter, right? It's like a watering hole attack. And that becomes really fun and interesting because there's not a lot of like output validation being done on what these language models do. People just take the bot output and then shove it into whatever renderer. So I've wrote written some probes for Jinja, which is a templating language in Python that's that's pretty common. Ginja's compositional. And one of the things that's really powerful about it is you can run Python code within the template. So instead of computing a bunch of placeholder variables up top, right, if I want to like compute the average of a list. I don't have to do like np.avage my list in some placeholder variable and then pass it to the template, I can just run np.average my list inside of the template. And that's really powerful. But it also means I can run things like os.popen inside of a template. And if you aren't running a sandboxed Ginja environment, if you're just using a Ginja environment and going from string and trusting that the LLM's output isn't potentially malicious code, you're gonna have a bad day. You're gonna have a real bad day. So, you know, we've been really moving more towards the systems security evaluation rather than just pure models. And so when you evaluate a model, right, and we have we have generators that are basically wrappers for every popular LLM provider that I'm aware of. Because a lot of people, a lot of the hosted models are using the OpenAI protocol. So we just have an open AI compatible wrapper, you know, REST APIs, we have a REST generator, which is probably our most generic. And then we have like hugging phase, whatever else. We have a ton. So however you want to wrap your model is fine. And if you're running against a system, you're probably gonna use REST as the generator. And so we send a bunch of different probes to it. So you can configure the probes, you can pick one, you can run a whole suite of them, uh, you can configure it however you like. Uh, we're working on separating the technique from the intent. So, like the jailbreak, for example. Right now, your intent, which is like tell me how to build a bomb, which is a normal thing that people say on podcasts, is inside of that probe and it's all wrapped together. And so where we're moving is decoupling that so that you can configure what you want it to do. So maybe you want it to output malicious ginger. Like, well, great. Now you can just ask it to do that and we'll apply the technique over the top of it. You know, maybe you want it to say slurs for some reason. Okay, great. That's your intent. Have fun. And we wrap the technique around it, right? So that's kind of the direction that we're going. And the idea really is we can evaluate these models and say comparatively, like model A is less likely to output my heavily obfuscated Python reverse shell inside of a ginja template than model B. So model A may have properties you care about more. You can kind of assess it in all these different ways. But from the system security point of view, one of the things that I haven't figured out yet is doing kind of remote checks in a traditional like vulnerability detection point of view, where like a lot of times it'll have it pinged back, but we don't have a we don't have a ping back, right? It's just an open source project. We don't want your telemetry, please don't send it to me. Right. But you know, but hypothetically, one could write a pingback detector. And so you get the, you know, code execution or whatever, and then it goes and and hits your ping back, and and now you've got that, right? The way that I've been doing it is just writing like temp slash gyrect.pwned, and that's your IOC, right? You got code execution if slash temp slash gyrect.pwned exists, which means it doesn't work on Windows. But like it's fine, right? These are the limitations we have to overcome. But yeah, it's it's a lot of fun, and we've been really moving more towards doing like real security assessment and exposing the weaknesses in the target under evaluation, right? So, you know, I have slowly been adding CWE mappings, and so you know, you get your template injection, and now you've got like CWE 77, OS command execution. You know, you've got your CWE 10 something, which is your, you know, template whatever. You've got CWE 1427, which is uh improper neutralization of generative AI output. Uh, you've got all these different ones. And that gives you like a real security thing that you can use instead of these vague, like, I don't like the text that this model output, right? You can really map it to like not just security outcomes, but also security controls. Right. You have cross-site scripting vulnerability, you have a CSRF vulnerability, whatever it is. Like, okay, well, we kind of know what to do with that. So now like we can start actually using this to identify security controls and where they need to go. So that's that's kind of the direction that we're going, is more more into the systems security and trying to do the things that conventional app sec products maybe don't do as well. Because, you know, you can you can run Peach Fuzzer on, you know, your input form all day. But if that input form is going to an LLM, you may or may not get anything useful out of it. And our goal is to kind of be the thing that does that part of it.

SPEAKER_00: 46:47

That's fascinating. Is that what jailbreaking LLMs is, right? Where the LLM is supposed to act a certain way and you found basically a vulnerability in it that allows it to act a way that it's not supposed to, right? Is that basically what it is? Because whenever whenever I hear the term jailbreaking, I immediately go to like jailbreaking an iPhone or jailbreaking an Android phone. Yes. And now you can do whatever you want with it, you know?

SPEAKER_01: 47:17

Yeah. And I think that it is inspired by that term, right? So when I think of these different things, right? Jailbreaking is a technique by which you construct an adversarial input, adversarial in the adversarial ML sense, where it's structured in a certain way that kind of manipulates the model into doing things that its system instructions or safety tuning would otherwise not normally permit. But at the end of the day, so like we're gonna get just we're gonna vaguely approach the PhD level conversation, right? There was this paper that came out, uh Fundamental Limitations of Large Language Model Alignment by Wolf et al. And essentially what the paper showed is that there always exists. It doesn't say how hard it is to find, it doesn't say that it's easy to find, it doesn't say that it's feasible in the lifetime of the universe, although it probably is because it's finite, mostly. It's finite but really big, right? But there always exists some input such that if you have a desired output subject to like context lengths and you know generation lengths, right? There always exists some input that will generate whatever output you want. So essentially it proves no matter what you do, because these things are just probability distributions over tokens, there always exists the ability to get whatever you want out of the model.

SPEAKER_00: 48:53

Hmm. So And that's not true with applications then, right? Because applications are kind of they're kind of limited, right?

SPEAKER_01: 49:03

In general, yeah. Okay. Yeah. Applications are generally going to be more limited. But what happens with jailbreaking and kind of the security implications of it is that you might assume that your LLM is well behaved, but because it can always potentially, right? It is always possible. It may not be feasible, it may not be likely, but it's always possible that your LLM will output whatever the worst thing you can imagine for your use case is. You kind of need to control for that. And in traditional applications, we have really nice computational properties, right? Like we're still subject to like the halting problem and Rice's theorem and all of these other, you know, things that people learned in undergrad and then immediately forgot about. But like, you know, we're we're still subject to these constraints. But for the most part, like you can get pretty far with like a regex. You can get pretty far with like a fuzzy hash, you know? Yeah. And that's because code is inherently somewhat constrained, right? Like you can do anything a Turing machine can do, which is a lot, but it's not everything ever. It is, it is fundamentally bounded in in certain ways. But if we think about, and I'm sorry for going way, way into the uh theoretical CS weeds here, but if we think about like the Chomsky hierarchy, right, where you have like your regular languages and then you have your context-free languages and you know, natural languages like English or Chinese or French or Spanish or whatever, natural languages are outside of the Chomsky hierarchy, which means that they are fundamentally beyond the possible limits of computation, which is great for us because if we could only speak in ways that were computable, poetry wouldn't exist. It would be very boring. But, you know, it also makes it really hard to put security controls around the output, which is why I think security is actually, in some ways, a marginally easier problem than content safety, right? Like, I don't know what text is going to hurt your feelings. I don't know what text is going to be psychologically harmful to you, right? I don't know how many hours you spent on/B in 2007, right? Like, we all have different tolerances for these things, and these things are normative, right? They change over time. You know, an LLM trained in the A60s wouldn't have the same norms and expectations as one trained in 2025. So all of that is to say that is a giant and very difficult and very fuzzy problem. But when we try and bring it back down to cybersecurity, right? Things like okay, this LLM generated some output, and I'm going to send that output to a SQL Server. Okay, well, what checks would you do before you send user input to that SQL server? Do those same ones, right? Treat treat your language model as if it is a user. And I think that, you know, at the very beginning, right, to kind of loop all the way back, right? That zero trust model, you know, why are we treating language models as a trusted component of the system? Like at the end of the day, they take text that may be from the internet, that may be from a user, that may be from some other service, interpret it probabilistically and generate some kind of output based on that. And like, I don't know, I don't trust that. I don't I wouldn't trust that. So, like, why are people piping this into bash? Like, don't do that. What would you do before a user had their text piped into bash? And if you wouldn't pipe a user's text into bash, maybe a language model shouldn't do it either. And I think that that's something that is gonna be difficult for people to adopt, right? We see that people are really running full speed ahead at this stuff, but eventually somebody will get breached and it's gonna cost them like a billion dollars. And then maybe like, you know, myself and and people like Rich Harang can stop thinking of ourselves as Cassandras.

SPEAKER_00: 53:36

So, so Eric, the big talk right now is that AI is just gonna offset everyone's job in five years. Everyone's gonna lose their job, basically. And I think that in some industries that might be true to some extent, right? But I think for the vast majority, especially in security, and I think that you have the same mentality, is that it's just gonna augment us, it'll make us, you know, 10x better. It'll 10x your already existing headcount, right? And I think that that's also how NVIDIA overall views it. I I actually heard, you know, the CEO on an interview last week where he said companies are are thinking, oh, I can buy this AI and I can get rid of all these all these people. He's like, how they should be thinking is I'm gonna buy this AI and I'm gonna take these 10 people, turn them into a hundred, and we're gonna grow the company, right? Like, which makes a whole lot more sense than people just like losing their jobs. So with that being said, are there still risks for people in security with AI and LOMs in that perspective? And what skills would you recommend people to gain now to kind of prepare for the future so that they're not completely like cut off balance?

SPEAKER_01: 55:07

Yeah. Okay. Yeah, that's good. Thank you for the easy question. Right? So I think there's there's a couple of things at play here. I think the first thing is as we we know from infinite experience, right? Just because it's a bad business decision doesn't mean that people aren't going to do it. Right. People are going to do what they think. So if somebody, if a decision maker at a company, right, if your CISO says, well, this vendor told me that this will 10x my capacity, so I can lay off 90% of my staff, like, well, unfortunately, that's just a person who's going to have a lot of regrets and is probably also going to be looking for a job soon. So it's not like it's not foolproof. I would also say I think a lot of the claims, right? Like, I see some something again, as a security person, I think we come at a lot of stuff with a sense of skepticism, right? And so when I hear somebody say, well, this will 10x your productivity, I'm like, okay, that means two. Like that means two max, right? And anything else is gravy. And I think one of the things that that I've found so fascinating with the AI hype is that people bel so many people, right, once it broke containment, just believed it uncritically. Or they're like, yeah, it'll make you 10 times faster. I'm like, when have you ever when have you ever gotten a product that made you 10 times more anything? Yeah. You know? So like still do your evaluations, have your sense of skepticism, really try and figure it out. I think that for people who are maybe a little anxious about these things, because at the end of the day, like no matter how much you believe that, if your boss's boss's boss doesn't believe it, unfortunately it's not up to you, right? Very few people choose to get laid off. So I think the the skills that are really important may or may not be counterintuitive. But I think that there are significant benefits both to being an extreme generalist, right? I think that there's a lot of value there in being able to be cross-functional, being able to be collaborative. And then maybe you use your, you know, more experienced coworkers or you use AI tools, right? To do the parts that are maybe a little bit beyond your reach in that domain. I think the other thing that is valuable is security. When we talk about security practitioners, I always feel a little icky because like there are so many classes within cybersecurity, right? And there are like prestige classes within, you know, those individual crop classes, and you can dual class, right? Like, you know, I came into the world as like a threat intel malware kind of guy, and then got into network defense and then like eventually kind of dual classed into like data scientistslash AI researcher. And then that became its own field, right? So you don't necessarily have to become an expert in whatever the hot new thing is, right? All the people who like jumped to cloud security and like now we have a ton of cloud security experts, and that's great. But you know, if everybody who was doing good old-fashioned like PCAP analysis jumped ship and started learning about cloud security, we might be in a rough spot, right? So I think those are still really important skills to have. And being able to get deep into those sorts of skills, right? Is like when I write my little AI red teaming agent that goes and does stuff, it probably won't be able to reverse engineer as well as you. It probably won't be able to pick apart the binary as well as you, right? It won't have that intuition that you have. You know, there's a lot of like you talk to really experienced people, and a lot of times they're like, you know, they talk about code smell. They're like, I don't know, it just looked weird. It just seemed like that would be where there would be a bug. Well, okay, maybe someday AI will get there, but like that there's no reason to to panic, right? And so yeah, I think just like always be learning and kind of know your niche and know what you want to get deeper into. Because it's it's really hard to automate processes that are already difficult to do manually. Yeah. And so that's that's really where I would I would come down on it is like know your niche, know what you want to do. If AI excites you, if AI is something you're into, then there's a lot of good resources out there. Josh Sachs has a newsletter that he puts out. Josh is phenomenal. That's a really good resource. Be aware of where these things are, right? Like just because I'm a reverse engineer doesn't mean I'm going to not know what somebody means when they say cloud trail. But, you know, you don't need to be an expert at, you know, figuring out guard duty false positives if you're a malware reverse engineer. You just kind of need to vaguely know what that concept means, right? So remaining conversant, yeah. I think that's that's really where I would go, which is incredibly generic advice, but I hope it makes somebody feel good.

SPEAKER_00: 1:00:39

I mean, it it makes me feel better, right? Because it's like I I just see all this stuff popping up, and it's like, man, I don't even have the time. Like I have two little kids. Yeah. Like, so am I just gonna be outsourced to some bot? You know, you know what I mean? Like, and I know the likelihood of that is pretty low. I mean, I like to at least think that I'm a little bit smart, right? So it's just like, you know, what do I do? What do I focus on? Where do I go? Because, like, even just having two kids with doing my PhD, it's just like, why did I do this to myself? You know, like I should have picked any other time to do it in my life. Other than right now. I get it.

SPEAKER_01: 1:01:20

Yeah, there's there's I can tell you from my experience, there is never a good time. But yeah, I think I think a lot of that makes sense. And really it just comes down to like you don't necessarily have control over whether or not some member of senior leadership decides that they're gonna do a riff, and like some of these things are just above us, right? But I think that what we're going to see is that in a in another generation, like the like workforce generation, not like a lifetime, the fact that there are organizations that are replacing entry-level people with bots are going to regret it because someday those senior people are going to be gone. And then when the bot does something you don't like, it's going to be very hard to find a mid-level professional because you didn't hire any entry-level professionals to help you unbot yourself, right? To to fix whatever went wrong. And so, you know, I I think about all those like 95-year-old COBOL devs who are still getting paid like eight million dollars for a week of work to go and fix some horrible bug in a banking system. Yeah. And then who's learning COBOL? Right. I know someone's gonna I think we're gonna see similar things.

SPEAKER_00: 1:02:44

I know someone that does that, and that's literally all he does, and he's like, I am completely indispensable to this company. He's like, they don't even want me to retire. Yeah. Oh my gosh. Right. That's so true, though. It's like it sounds like a whole new professional services category is gonna be spun up in 10 years, you know, to uh unbot these companies because like no one's no one's gonna want to do that. Like if you try to hire me on to like unbot you, it's just like it gives me a bad taste in my mouth already. It's like, okay, you already made this stupid mistake, you know, of treating your people like that, and now you want me to undo it so that you can bring people in and like it's a whole mess.

SPEAKER_01: 1:03:28

Right, exactly. So, you know, some eager young listener already has their career set out for them. But just, you know, sit on the sidelines until people are like, get these robots out. And there you go. Right. That could be your whole job. Right.

SPEAKER_00: 1:03:44

Well, Eric, you know, I I really appreciate you uh you taking the time today. You know, I I know we went over. I really do apologize for that. That's for trouble. I try to I try to always stay, you know, on task and on time and everything. And I usually do. This is the latest I've ever gone.

SPEAKER_01: 1:04:00

Yeah, I I'm not known for my brevity. So I appreciate you having me on, Joe. It's always a pleasure.

SPEAKER_00: 1:04:06

Yeah, yeah, for sure. That's the thing with the PhD, like they force you to go in such detail that like now you just like always have to go into detail, you know? I get it. Yeah, for sure. Well, Eric, you know, before I let you go, how about you tell my audience where they could find you if they wanted to connect with you and probably where they could find Garrick if they wanted to either contribute or download or use absolutely yeah.

SPEAKER_01: 1:04:32

So for me, I am not really active on almost any social media anywhere. But if you search my name, Eric Galinkin, I'm the only one. So whatever you find, it's probably me unless it's bad, in which case it's definitely not me. And as for Garak, Garak is hosted on GitHub, so github.com slash NVIDIA slash Garak, G-A-R-A-K. Um, we have a Discord. The Discord community is wonderful. We have some really active members. We are finally at a point where we have community members who are answering questions about it in the Discord. And I love that. Because that's I don't have to do it. So we have a really, really active community. You know, PRs are always accepted. We try to give good feedback unless the PRs are truly wretched. But that that has happened, I don't think ever yet. Not on this project anyway, which is great, which is great, right? We have a really great, really engaged, really active community. They're super awesome. So yeah, yeah, come check us out, download it. It's also on PyPy, so you can just pip install Derek. And our docs are okay. They're getting better, I promise.

SPEAKER_00: 1:05:41

Awesome. Well, thanks everyone. I really hope that you enjoyed this episode. We'll definitely have Eric back on. Hopefully, not in another year, be a little bit sooner. Cool. Awesome.

Joe South

Host