Hou.Sec.Con 2025 Revisited

At Hou.Sec.Con 2025, I took the stage to unpack a deceptively simple question:

What does “penetration testing” even mean in an AI-driven world?

If you’re a security leader, AppSec engineer, or GRC owner trying to keep up with AI adoption inside your org, this conversation helps you zoom out, reset your mental model, and ask much better questions about how you test what you’re shipping. Here is the full transcript of my conversation along with some additional key takeaways at the end.

Why You Should Read This Conversation

You’ll get value from the transcript below if you:

Own or influence “pen tests” and are tired of checkbox exercises that don’t match your real risk.
Support teams shipping AI features (LLMs, RAG, Copilots, agents) and suspect your current testing approach doesn’t fully cover them.
Need language for leadership on why “go fast” without threat modeling is quietly creating a lot of new attack surface.
Want practical AppSec wisdom from someone who’s seen the entire arc—from 1990s network tests to modern AI-enabled apps.

Video

Full Transcript

Hello. Welcome everyone to Hou.Sec.Con 2025, our second session and our second speaker here in track six. Our speaker is John Dickson. He is presenting for us Pen Testing for AI created apps. Updating your testing approach. John is the CEO of Bytewhisper Security and an internationally recognized cybersecurity leader with 25 plus years of experience. He is the former principal at Denim Group, leading its successful acquisition by Coalfire in 2021.

He is an Air Force veteran, serving as an intelligence and cyber officer with AF IWC and AF CERT, and he is an active researcher and speaker on the convergence of AI and cybersecurity since 2018. So everybody, please welcome John Dickson.

Thank you, thank you, thank you. Okay, this is going to be fun. First of all, I think this is my third or fourth Hou.Sec.Con This is one of the most fabulous regional conferences and now national conference. So thank you, Michael Farnum and the organizing group for putting this on. What I’m going to do, I hope, I hope, is to really make you think differently about an overused term that is so overwrought with meaning or no meaning.

Now, penetration testing, that it almost invokes that conversation every time you use it. I’ll talk a little bit about how that changes with AI, and I do hope that you all ask questions. We’re not big enough. We don’t have enough people here or it’s overwhelming. So particularly the folks up front, if you feel like you want to ask questions. It’s okay.

I mentioned that about penetration testing. I try to use, not use certain words in my vocabulary or expressions. I don’t use the word shift left or the expression shift left. I don’t use that anymore. There’s a bunch of them, and I actually don’t use penetration testing anymore because my next it’s just been overused.

And I’ve been in the business for a long time, and I’ll explain that. But, got a little bit of background there. I am, I, I’m CISSP since 98, so 4649 is my number. So I’ll buy you lunch if you have a lower number than that. I usually never buy people lunch. That was 98. I was an ex Air Force CERT person doing emergency response stuff when there was no EDR or SEM.

It was all like external windows. So, I really know Unix, so did know Unix back in the day, but I’ve been in the business for a long time, and I’ve seen this arc of, the use of the term penetration testing or testing so much. And in the most recent past, it was an app sec guy.

So hard core app sec guy, DevOps person, and really working with the Fortune 500 on matters of software risk for the last 15 years mentioned that I’ve been in AI since 2018. That’s probably a little bit of a stretch, but a true story. 2018 I put in a CFP at RSA on AI, and I got it. And then I was like, oh crap, I have to really learn it now.

True story. And it was like a use cases for AI in the enterprise. Like with no ChatGPT to do it for you. So I had to like, learn it and the next year. I got accepted on one. It’s pretty funny to how to vet vendor claims about AI 2019, when AI was really machine learning. It was really nothing, right?

And again, it was like a self improvement bucket list thing. Learn AI and the way I did it is I got accepted at these national-level conferences and then oh crap, had to learn it. And that’s how I got into it. True story, but I’m happy I did because we started to get a lot of questions about it at Denim Group when we were doing, doing app sec assessments.

Well, what about this? What about ML like? Well, we don’t have a good answer, so we learned it. So that’s me. I did this last night. That’s ChatGPT. Yes. That’s, put face on the cool. Hou.Sec.Con Cowboy. That took like two minutes. So I’m also the kind of nerd that loves the little AI hacks.

But I also have great stories on hallucinations and the badness of AI. For the record, I use it every day, all the time. And I’ll just tell you a great OWASP. How many people know open web application security projects? Everybody. Almost everybody. If not. Okay, so when we first started by Bytewhisper, we got asked to do a training class on OWASP top ten for LLMS.

And for those that have ever done curricula developed like presentations and training classes. Very laborious process, right? Very time intensive. So ChatGPT is out there. I said, hey, build me a eight hour training class with an outline and, put it in this format so I can send it to the client as a proposal because they just wanted a straw man.

So I did it. I was like, I felt really smart. I was like, cool, there it is. Bam bam bam bam. I sent it to our CTO, and he’s CTO for one reason, because he’s far smarter than I am. And he said it back and said, go back and read it closer. And what he had done is it had taken OWASP top ten for LLMs.

And the first six were absolutely right spot on the right ones. But seven, eight, nine and ten were from other OWASP top ten list. They just put him in there. So there was a lesson there, which was great for certain things. But absent of human checking, you’re setting yourself up for really embarrassment. And the classic now is we do this, I do it to hey, I need you to put a blurb out there on LinkedIn about this post that another vendor did.

Like the vendors call me and say, hey, could you repost this, this, this vulnerability that just came out on agenetic AI? Our research is out there. I would like you to repost it, which is flattering. So I go and do that. So what’s my first tendency is ChatGPT and it almost always gets it wrong. So half the time you can’t do that or it’ll put another vendor’s name in.

That happened one time. Where it was like yeah, it’s not that. So okay, so let’s talk about penetration testing. I think I was doing my first penetration testing, working on a team in 1997, with a company called Trident Data Systems that no longer exists. And we did a network security test, or I already called a pen test. It looked like a lot of effort, a lot of manual testing.

And we did a lot of dumpster diving and social engineering at the time, which is pretty cool. Like that was all this is, if you remember the movie Sneakers, like, oh, we got to do that too. Social engineering, dumpster diving, kind of nasty, kind of dirty. Putting your consultants in harm’s way. But that’s what it was that network a lot of network,a lot of network because that’s what existed at the time.

And really what you were doing was trying to define the trust zone and understand everything outside of it. So for those who are old enough, I remember seeing a couple of presentations where they used the old wagon train metaphor. Here’s our wagon train, here’s all the good guys in here, and here’s everything outside of it that’s bad. So this is before zero trust.

This is before APIs and connectedness. So you it was pretty straightforward. And but building blocks in spite of that or despite it the definitions of pentesting still varied wildly. Oh. Do you mean like a really a hard core manual pen test? Do you mean like, just run, a scanner? What do we mean? So even in those early days, now, 28, 27 years ago, there was still the first five minutes of every protocol, of every pen test was what do you mean by pen test?

You want this, you want that? Because on the vendor side, we’re scoping pen test. And it’s like, is that two weeks, three weeks or not at all? And I’ll give you an example of one that we did at Denim Group. That was pretty cool. It’s now it’s published so I can talk about it. We did a penetration test of DARPA, a test environment, a DARPA, a very cool like environment.

What they did was it was a test environment, and there was zero test coverage, like no tools. And that’s the kind of thing that back then was pretty straightforward, but, like, you’re looking for anything but. So the definition of is it a DARPA test or is it am I running tests that existed? If you fast forward to like 2004 ish, that’s a, a little bit of a time frame where you start to see more functional code shown up on the websites.

We had dot. Net, you had JavaScript, Java running on, on websites. You see companies like Atstake if you remember them FoundStone that were like revealing the first injection flaws, then testing or penetration testing or whatever became a little bit app centric. Oh yeah, you can do the network stuff. We know that network stuff being really TCP port configurations and other, you know, patching.

But yeah, I really want you to do, you know, penetration testing of our apps. Okay. So now you start to see things diverge a little bit, and you hear the term assessment used more frequently. The interesting thing I didn’t mention about did not mention about penetration tests which exist is there was the early the first 5 to 10 years, people would just do a pen test and get root and that was it, right?

I routed you, I proved I could. So if I rooted you on day one, if I charge you as a vendor, $40,000 for a pen test, if I rooted you on day one, that’s my pen test, right? I rooted you, so then it became a little bit more methodical with apps. And you had this term assessment where I’m like, it’s inferring a little bit more completeness and looking over the entire attack surface a little bit more.

But you still had network and that were, penetration testing became more commoditized. You see the rise of scanners, it kind of look like that. And, others that are out there. And for the record, if you go back to 1997, true story, some of the network security test or pinch tests that we did were $100,000.

I mean, true story, $100,000 network tests that look like scans, you know, like now are just automated, right? So these things have evolved. The constant is the right side. The definitions themselves still, you know, still wildly differed. And so the first part of every discussion usually is okay. What do you mean by a pen test. And how long is it manual or is it not.

And true story. I mentioned, in the intro that my company got, acquired by Coalfire. A lot of people know who Coalfire is, like the first three months of that post acquisition discussion is. What do you mean by a pen test? Because they differed. It was almost like a religious debate. It differed wildly between our team and their teams.

So that was interesting at the time. So what what do you do in a pen test typically, you know, I, I would always say define what threat you’re talking about right. What is the perceived threat. Are you talking about the PLA or the Russians or script kiddies. And we’ll kind of scope it accordingly. We’re going to conduct some kind of reconnaissance footprinting.

We’re going to go find out what’s out there. Now, a lot of this stuff in 2025, I would say you just give it to us like we’re going to get it anyway. So just give it to us instead of, you know, an outside attacker with no knowledge. I’m an outside attacker with some knowledge because it makes it cheaper, I should say less expensive.

Some scanning manual testing, particularly in the app world. I mentioned the DARPA test. We also did a tremendous amount of testing for one of the major cloud providers. And you know what the test coverage was for automated tools for their environment? Almost zero. So like what we did for every test was the first two days were whiteboarding and then threat modeling to find out where surface area even existed before we ran anything.

So. So the more sophisticated, unique you have environments, the more likely you’re going to have to spend more time thinking than scanning. If that makes sense, then exploitation that’s fallen out of vogue, by the way. You know, now, I assume if I got have a higher critical, I assume that I could get in in the early days, we’d have to prove that we got in by putting an image or something on somebody’s web servers.

And in our case, at Trident Data Systems. It was a Barney image. We had a, you know, a tilde Barney dot jpeg. We put on everybody’s root directory in their web server just to prove that we could do it. Now, the chance of disruption like that’s less in vogue. And then reporting and remediation. And one of the trends now is less reporting.

You know I don’t need to think report anymore. I really need the quick and dirty, done. So we’re looking for coding flaws that, that look like that injection flaws, cross-site scripting. Misconfigurations. The one thing that I will say over again, the last ten years is the most egregious. The top ten scariest vulnerabilities we found were not misconfigurations or even coding flaws.

They weren’t SQL injections. They were the crazy architectural flaws where somebody trust an input. Our trust in API input or, you know, you can traverse client data because of the way you implemented off on the server. So a lot of this stuff is again looking at it Misconfigurations mistakes. Oh, we open up a TCP port. We forgot to close it out or we wrote code wrong way.

But the real scary ones are the architectural ones. The ones that you won’t get with any automation. Hints. Back to the manual testing. And there’s a theme here. And the good news is, if you do it over, you can rinse and repeat and do it continue. So there are some strengths to penetration testing obviously. Right. I mean you can find stuff before the bad guys do.

That’s the general thought here. Right. And once you define and get that protocol down and figure out, okay, this is what I mean. This is what you mean, then you can do it over and over and over. In theory, and I’ve seen certain larger clients, they’ll have a, an established group of companies that they trust and have vetted to do testing.

So they’ll move them. It’s usually the same suspects. I won’t name names. And you’ll see sometimes from a supply chain standpoint, say, oh, we’re going to do work together. I’m going to use my trusted vendor to do a pen test of your environment. So the trust mechanism is the is the vetted vendor or the vendor collection. So and once you do this, you still get different shades and different variations because it’s human beings plus automation.

But you kind of get it in a repeatable process. You kind of doing stuff and it’s effective. I mean, you’re you’re generally addressing risk. I would say you’re finding stuff before the bad guys do. The downside of it is, again, still not universally accepted. So when you if you get anything out of this presentation, the one thing would be when somebody says, oh, we need to do a pen test.

Your response is what type of pen? What type of test? What do you mean by that? Let’s talk. So the depths of testing can vary. It still does. I’ve seen ones where there are really audit driven. You know, again, I’ve been on the vendor side for most of my career. And a great example would be, oh, who’s who is the actual client buying it?

Oh, it’s the VP of audit. The VP of audit. Do you think the VP of audit has different desires, than like the actual VP of security? Of course they do. Like they want many times surface level or checkbox. And as I mentioned, the cloud vendor could care less about, cloud vendor. We do work for you care less about checkbox.

They really want to find the crazy vulnerabilities before they get out there. And by the way, that cloud vendor, like many of the sophisticated ones out there, had internal testing and scanning in the SDLC that internal testing teams, they had external testing and then they had a bug bounty program. So like 4 or 5 levels of testing, like if you’re doing 4 or 5 levels of testing, you don’t care about checkboxes, right?

So penetration testing as another checkbox is bad. We’ve seen these programs become static and then they’re like fire and forget where they’re hey, we’ve been doing a pen test in this new company Bytewhisper that we’re doing we’re doing AI driven testing. Like the first discussion we have is, oh, is that the same as penetration testing?

And that’s what generated the whole thought behind this presentation is they’re they’re not they’re not unlike each other. There is overlap. But it’s funny because a lot of these clients have their unique pentest budget. They’ve been doing pentesting. They’ve been using the same profile. My point being is after this session, you should ask that question what do you mean by pen test and think differently?

I would argue almost all the penetration testing that we bumped into and seen does not address the incremental or additional risk that AI presents at all. At all.

Okay. So fast forward to now. New company Bytewhisperer. We’re doing, as I mentioned, AI testing, threat modeling, hard core stuff. Still doing networking app. We still see that a lot. Sometimes you see the segregations where a network test is like a standalone thing, an app test or the what. But really what happens is the AI part is part of an application test because it’s part of an application with the data below it.

And again, the constant is still varies crazy between the different ones in between industries. I mean, oil and gas is different than financial. I mean, it just is. And even within banking sectors you talk to the big banks versus the community banks, different testing approach, different appetite for risk. So here’s where it’s going now. And talk about this accelerated by oh my gosh, the craziest, fastest implementation of a new technology in the form of AI.

I was on a panel at RSA in April with a guy named Anton Chavkin from Google. A lot of people know him and we had a great quote. I loved it, said, hey, we don’t even have enough practices now to have best practices. Like everybody knows this is new, but what’s happening is because CEOs are afraid of missing out, fear of missing out.

Like everyone’s going a million miles an hour an hour. Personal hands on experiences. They’re doing so without security. As a planning consideration. It might be an afterthought, but it’s not a planning consideration. So what? What’s happening? Here we are again, this feels a lot like 2004 from an app standpoint. We’re creating an attack surface without really understanding the underlying stuff.

And if I didn’t think that was the case, we have usually about a test a week that comes in. We had one two weeks ago where, classic app sec like we got privilege escalation, we got to we rooted them within like two days and their LLM was in the cloud and public facing which which which really wasn’t a security risk.

But if you wanted to generate a like 4K 60 minute video off of their large language model, you could do that and they would not know it until they got a $10 million bill from Anthropic. So there’s new, new things when talk about that. Okay. Another key point all these things matter. There’s like a perception that we see and I hear in conversations almost weekly oh, it’s AI like easy button magic.

It’s magical auto magic right. All this stuff matters even more in a world where you don’t understand the sequences, you don’t understand what it’s doing. And one of the other things that came out of the RSA panel was the fact that we generally think the complexity of AI is making it harder for us to understand the risks. Let me say that again, there’s a bunch of people that are trying to compare AI to mobile AI to cloud migrations.

AI is different from the standpoint. Is it actually a bit more complex? But all these things that we’ve known and learned to love over time matter even more. So concepts like defense and depth, like least privilege, matter more. Okay, so with apologies to the Monty Python’s Flying Circus, for those who remember that a now for something completely different, this is where we go and veer off into the the weird world of AI and security.

So I went a little fast. There, see if I can go backwards. Okay. How is it different data data data like who cares about bias if you’re you don’t have a large language model? Who cares about all these different things? It really is a data science problem as you can do some crazy stuff with data that you couldn’t do before, and you don’t have to worry about, we’re talking about non-determinism and randomness and how that’s antithetical to our compute model from the 50s until three years ago.

We’re talking about hallucinations. Everybody understands this. I think I gave an example of one. They’re easy to understand, easy to see sometimes. But guess what? If you’re pulling code from an API call to OpenAI, you’re not going to see that hallucination, right? Okay. Unintended bias. You can manipulate the inputs also. And here’s another thing auditability explainability.

You have to say some of us do have to go to auditors and say, well, how did you get that conclusion? Oh, I don’t know. It was a LLM that doesn’t cut it. And by the way, the big LLM producers know that they’re trying to pull that through and fix that a bit. But let’s talk about Nondeterminism for a bit.

How many people are willing who wants to stand up and give me a definition here? Okay. I won’t call on James Cooper or I won’t call on Mary Dickerson or the other one. Okay, so Nondeterminism, I should say determinism is this idea that if you put an input in, you get the same result every time, right?

So Nondeterminism is when you put in an input, you get a result, you put in the same input, you get a different result, put in a prompt, you get a response back, you put it in a different the same prompt. You get in a totally different. That is anathema to our compute model. And that’s actually a bigger problem in software development than hallucinations, I would argue.

So and it varies in small ways. I give you these examples and it takes an eye to look at them. If you’re doing, training tech or if you’re doing a presentation or something for LinkedIn, it’s not a big deal. But again, think of a world where you have if statements, you understand the logic, you can go through the logic and then you have an API call out to ChatGPT or OpenAI and it comes back so understood understood logic, logic, logic, randomness, logic, logic logic.

That’s the problem that we’re talking about right now with, with Nondeterminism. So a couple of things I’m not going to I’m not going to do justice to any of this stuff, but it does, in fact, create a new attack surface that you have to consider and build into your testing plan model exploitation, data poisoning, and adversarial inputs.

And I explain each of those, and again, when I say apps that are, that are using AI to generate code. So think of the Copilots also the ones that hit an API and pull in whatever the result is to okay, so model exploitation. You can create inputs and prompts by prompts by tricking the model to reveal stuff that it shouldn’t.

There’s systems now in third party systems that prevent that. But it’s safe to say the data scientists that many times create these things don’t envision the abuse cases that are out there. So having that abuse case, thinking about the models, you can extract stuff that you weren’t supposed to personal information. You know, I think we all know there’s training on internal corporate data.

Is is a is a no no, specifically outlook O-365 because you could then extract whatever the heck you want. HR data, all of it. So this is the probably the biggest one. I mean, like five years ago, who cared about data models and data science? Now it’s the biggest thing, explaining the model itself.

Data poisoning is another one, where you can inject malicious inputs, you can start to corrupt the data. You can train it to do certain things. The, the one thing that I would just point out to everybody that we know on the, on the bad guy side is they are putting up vulnerable code all over right now to train LMS, just blindly putting it up, with the hope that all one of the LMS will train on that data and then ultimately.

So that’s just the real problem that we have is we trust the outputs from ChatGPT as if it were gospel. Right? It’s not. And this is an example where you actually have people that are out there trying to maliciously put things, throughout the world, adversarial inputs, model degradation. You can you can get it to produce biased outputs.

You can target certain bad things that occur. And, and again, it reflects poorly on the company. Is this really a security vulnerability? You could argue maybe. Maybe not. But like, it certainly looks terrible. And a starting point for those who don’t know, the OWASP top ten, the classic one is the prompt injections, getting the models to do things that it shouldn’t.

And look at LLMs six, the one that we’re using when we get into a generic AI excessive agency, we’re putting too much trust in the outputs. We’re giving too much agency to the AI to do certain things without, without it. By the way, our company does a lot of AI policies, ironically, not by choice. It’s like we get pulled into it like we to do AI and I was doing one about a year ago where I couldn’t figure out what was missing in this particular, policy.

And then it hit me, they didn’t have any human in the loop requirements for any of their critical systems. And these guys were a electrical provider. They were a utility. And their AI policy had nothing to do about the excessive agency for certain things. And, oh, by the way, to do I worry about Grammarly on the desktop.

Kind of. But do I worry about you putting in the generation and distribution system and LLM that without a human looking at the outputs and, by the way, little aside out here somewhere is the, War Games Whopper. Have you seen the the Whopper? How many people have seen War Games recently? Like, okay, I watched it with my sixteen year old daughter about six months ago.

That movie still makes complete sense. What did they do in that movie? They pulled all of the missile crews out of the silos and just plugged it in. The Whopper, that’s a human in the loop. They took the human in the out of the loop here, and they gave the Whopper excessive agency just to make decisions on nuclear strikes on its own.

All this stuff is still relevant, and it’s funny. I would recommend you go back and watch that. Yeah. Okay. So put it all in context. AI powered apps still subject to the classic security things, maybe even more so. You have to pay attention to the additional attack surface. You can’t just point a scanner at it. Not now, at least.

Maybe in the future. And I would say that penetration testing, along with actual understanding with within the SDLC, are kind of the baseline ingredients of this. Okay. Here’s a little chart kind of the difference between all of it. I would argue the tools are different. I’d say the tools on the AI side are very immature. We’re starting to see those come out.

There’s probably more for protections than there are for testing per se. The biggest thing right now is just having people that understand and understand how to build the threat models. So what we do and what we see the best practice is more whiteboarding, more threat modeling, less hitting the scanner. Less automated testing. Right. Right. Now for this.

So, I mentioned the testing that we did at my last company for the big, cloud provider. I would say, again, maybe 50% of any project was thinking about the attack surface whiteboarding, collaborating due to a threat model to find out before we do anything. Otherwise you’re just, you know, scanning and getting zero. Okay. I can’t have a presentation about this without touching on Agentic AI, and we’re not going to do just this either, for the record. But let me just talk a little bit about this. The think about what an agent can do, right? An agent can do stuff in sequence and in order to do those things, the classic one is to go make a reservation, go pay for it, go put it on your calendar, go send an email, to my spouse or loved ones, do like 6 or 7 discrete steps.

But the main thing about Agentic AI is it has to have privileges and it has to have to, obviously has to have access to everything. The challenge is, again, we see these implementations without any concept of threat modeling or even, what I call abuse testing. Like, okay, what this third step right here where you make this call over here, what does that do?

What privilege does it have on the other end? So the the lack of understanding the lack of honestly deliberate thought, is such that it makes it really interesting. So what we tell people to do is like, come up with an approach, define scope. Do you threat modeling again whiteboard. Whiteboard okay. This is what it does.

Let’s understand what it does because once we do now I’m going to know where the weakness is. This is where we’re going to spin. You know nobody has unlimited amount of time to test okay. Based upon what the threat model. Here’s what we’re going to spend most of the effort here and here, because that’s where the likely problems are going to be.

Off and off become important again, as I mentioned, input validation. That’s unsexy, but totally rude. But also, you’re looking at everything and looking at the validation of the stuff that goes into it in the supply chain. So how do you do that? Some of that’s manual right now, like malicious input tools are starting to emerge.

You can run sims and snake Synk to look at the dependencies. This is basic building block Appsync. Fuzzer has become cool again, obviously, in understanding what you’re fuzzing before you do it. And then there’s a lot of shell commands and custom stuff. Key recommendations, threat modeling. How many people do threat modeling a regular basis?

Oh, thank you back there. You made me smile. I would it’s like flossing. You probably should do it, but nobody does. But in AI, it matters even more so because again, like, here’s a question. If you’re outsourcing your pentesting, how do you even scope it? And if the vendor comes back and says that’s that’s $100,000. And no, it’s not.

Yeah it is. Here’s why. But revamping your testing approach to probably pick up more of the data poisoning, or the AI stuff, look at the emerging frameworks. I asked this question every time, and I pick on everybody. How many people have read the NIST AI risk framework? The whole thing? Is that a yes? You read half.

Okay. Two and a half. That actually tracks with what I asked. At RSA, we had probably double the amount of people in the room. We had six people for 4 to 6 people said yes. And it’s not even though it’s been out there. But the reason I say that not to pick on people. And thank you for reading half I want to what what prevent you from reading the second half?

It’s a lot better, by the way. It gets better at the end. For the record, the lack of sleep. No. If you want to go to sleep, read the first half of the notes at this rate. But the reason I. I’m saying this is everyone invokes the AI risk framework as if it were a gospel, like, oh, it’s, you know, like but nobody’s actually read it and it’s boring as hell.

It really is. But but, so right at the top, I do recommend the annexes are good, but you know what? Speed. Read it so you can say you’ve read it. And next time somebody asks the heck yeah, I read it. Or if you read it now, is it over? Is it do a refresh. Absolutely. What I like is again, probably my true north for AI right now is OWASP top ten for LLMs, which by the way, have been, adopted and also updated last couple of years.

So they’re not static. OWASP ASVS yes. How many people have heard of ASVS? Okay, half of it now. Oh, you you’d read the whole thing this time? I’m giving you a hard time. This is this is improv right here. What we’re doing? No, ASVS is the application security verification standard. And what it does is it allows you to do apples to apples.

Comparison of application testing. To answer the question, how much? Right. So ASVS is actually pretty good. Between that and the OWASP top ten for LLMS, like okay, I’ve triangulated I know what we’re talking about. And then on the adversarial side, the miter in it stands for adversarial threat landscape for Artificial Intelligence systems, which is why they have an acronym Atlas.

I don’t know if they had the acronym. And backed into that, but either way, it’s on the adversarial side. It’s actually good to think of. So when your boss is asked like, what are the bad guys do? You can read the atlas and be able to say that, yeah, there I am again. That’s pretty frightening.

Final Thoughts

“Pen test” is not a single thing anymore The term has been stretched to mean everything from a quick scanner run to a multi-week, manual red-team effort. Step one is always: “When you say pen test, what exactly do you mean, and for whom?”
The worst issues are now architectural, not just bugs The scariest findings aren’t simple SQLi or missing patches they’re design problems: overly trusted inputs, broken auth flows, and unsafe assumptions about APIs and trust zones.
You can’t scan your way out of AI risk AI features (LLMs, RAG, agents) introduce new attack surface: prompt injection, model exploitation, data poisoning, misused outputs, and runaway usage/cost. Today’s scanners barely understand any of that.
Threat modeling and whiteboarding are no longer optional For complex cloud and AI environments, the first days of good testing are spent thinking, not scanning map ingress/egress, dependencies, trust zones, and abuse cases before you decide how to test.
Business pressure is outpacing security practice Fear of missing out on AI is driving rapid deployments where security is an afterthought. That pattern looks a lot like early 2000s web app adoption, only faster and riskier.

If you have AI related questions or concerns, please don’t hesitate to reach out to contact@bytewhispersecurity.com — we’d love to discuss your own use cases, and how we can help make them secure and effective.

PREVIOUSGPT-5: Dependency Woes

NEXTWhere AI Actually Delivers in Cybersecurity (And Where It Doesn't)