EP252 The Agentic SOC Reality: Governing AI Agents, Data Fidelity, and Measuring Success
35m 53s
In this podcast episode, hosts from Google Cloud Security interview Lars and Alex from Allianz about their company's adoption of AI agents in security operations. Allianz, as a heavily regulated global insurance firm, emphasizes a cautious, governed approach, using AI as a "co-pilot" rather than an autopilot. The AI agents assist with tasks like triage and malware analysis, handling routine work to free up human analysts, who remain in the loop for high-risk decisions, especially those involving employee investigations or potential severe impacts. The journey toward "agentic SecOps" is ongoing, with the company currently at an intermediate stage, leveraging both deterministic automation and AI where flexibility is needed. Governance is strict, involving reviews by an internal AI board and managing agents with principles similar to employee identity and access controls. A significant point is prioritizing deterministic solutions for clear scenarios and relying on AI only when necessary. The discussion also highlights the critical importance of data quality and understanding data pipelines to ensure AI agents operate effectively, underscoring that successful AI integration in SecOps requires a balanced, pragmatic approach tailored to organizational risk and regulatory demands.
Transcription
7445 Words, 40143 Characters
(upbeat music) - Hi there, welcome to the Cloud Security Podcast by Google, thanks for joining us today. Your host here are myself, Tim Peacock, the senior PM for all kinds of stuff in Google Psychops and of course, Anton Chuvakin. We're a foreign analyst and senior staff in Google Cloud's Office of the CISO. You can find and subscribe to this podcast wherever you get your podcasts as well as at our website, cloud.withgoogle.com/cloudsecurity/podcast. You know listeners, I've said that URL 250 times, still sounds like a mouthful to me. I don't know how any of you remember it to type it in. If you enjoy our content and want to deliver to piping hot every Monday, please do hit that subscribe button. You follow the show, argue with us and the rest of our Cloud Security Podcast listeners on the Google Cloud Security Community page. Anton, this episode started from a conversation in a karaoke booth in Singapore. What, I did not know that that story. Well, of course, I wasn't at the event in Singapore. Unfortunately, I missed it, but I did not know that because the guest is from Europe, no? - He is. Both of our guests today listeners are from like quite an old, well-established company in Europe, talking about how they're using Gen AI in Google Psychops to accomplish some really hard problems. But this was born at an OCISO event in Singapore over the summer. - I mean, you made it sound like it's a customer case study, but ultimately, you would notice that even though it occasionally has a customer case study vibe, it is also really fun. And it brings up some incredibly useful bits you can use while adopting AI for security operations tasks. And also other security tasks, actually. And by the way, one thing that impressed me the most, I'm just gonna go with this amazing highlight for me. - I'll go on. - Yeah, for me was the, you occasionally hear some sort of a various startup, large-ish barrier company, adopting AI and everybody's impressed. But this is a very old world company that is really, really impressive with using AI for security, despite regulation, despite maybe culture in the region, despite other things. But they're just really good in that context, which to me is quite amazing. That to me was a shocker. That to me was the mind blown moment. - Well, it's good that Alex and Lars are so good in this context, because I have to say listeners, Alex is not so good in the context of singing karaoke. So maybe with that jab at my good friend, Alex, let's turn things over to today's guests. - Today we are joined by Lars and Alex. Lars is our global head of detection and response at Allions. And Alex is a deputy group CISO at Allions. I think a deputy group CISO might be a new title for us. We've had CISOs and BISOs, but a deputy group CISO might be new for us. So Alex, you're one of our closest partners. We love working with you on the SecOps team. And this is one of those funny episodes, listeners, where I will wear both my SecOps PM hat and my podcast toast hat. You've been with us on this journey from traditional SIM to agentic outcome oriented sock. How did that go for you? And especially how did that go? It's something as heavily regulated as a German insurance company no doubt is. - Well, let's first of all say, I think it hasn't gone, it's still going, right? So it's a continuous journey. We're on that journey towards agentic sock. One of the key questions that I keep asking myself and asking the team is, what does Dunn look like in agentic sock? Right? So if you ask the gardener analysts of these days, maybe Anton can shed some light on that, but the definition on that is still a bit fluid. Well, fluid or flexible, right? If you go to RSA, I think you've got 200 vendors studying you 300 different versions. So for us, we adopted the crowd strived view of this and George Kurtz came up with his analogy of, yeah, using kind of the autonomous vehicle comparison where a VW bug is your level zero and your Knight Rider kit is your level five or level five it is. So an agentic sock, I guess, you know, level zero is everyone doing everything manually and level five is you sitting down with your newspaper just reading your newspaper while this autonomous sock handles everything. At least in our view, I think we're somewhere near level two or potentially moving into level three. So we're on that journey, we're not done with it yet. And so how has that been working with the Google team and going on that journey? It's been amazing, right? So as you said before, we look at Google as one of our strategic security partners. So we work very closely with the teams and that has two sides to that, right? So one is, we want to build trust in our teams on what Google builds, right? So we're involved in a lot of the, or some of the product development, especially on the agentic triage agent, some of the agentic malware agents and all those things that are coming for a sec up soon. But in the other hand, it's also, at least I hope it is, a benefit for Google to work with us because we have a global visibility of, A, what threats look like, B, what a, you know, corporate sock, that's a reasonable size, can do and thirdly also, you mentioned regulators before, we're quite heavily regulated, right? So around the world, I think, I tried to count, we have over a hundred regulators that all have different fund requirements with that on top of that. And it's probably around 100, yeah? So 72 countries and then we've got 52 in the US on top of that. So that probably makes it closer to 120. But I mean, some of them are aligned, some of them are not. And then we have our own internal legal and governance requirements in the company that come on top of that. But I mean, you know, if you buy into the agentic sock vision, going through all those hoops is actually worth it because the agentic sock is going to be the only way you can defend against agentic attackers. - I think on the second point that Alex said, one of the really cool things is, Alan's obviously has a really big environment that's super diverse. And a lot of the, the Google ways of how building it, working together with them really helps on trying out these agentic capabilities on a diverse data set that we have. Because I think that's kind of important for it to work. And especially for the trust aspect that Alex already mentioned, to really make sure we try it out. And being closer with the Google team really helps here because we get early access and we can get feedback super quickly. - With those different environments, are you having to explain this to different regulators? Are you getting multiple bytes that trying to explain this to people? Has that going? So different regulators have different views on things, obviously. - Obviously. - So yes, we do have to explain this time and time again. Luckily, or maybe not luckily, I don't know, I probably shouldn't put this on a podcast, but the regulators haven't dug that deep into agentic stock yet. Their current is still at least for what we see most worried about actually being able to detect and respond in time. If you look at some of the European regulation, especially there's a lot of regulation on how quickly you have to respond, detect, and all those things. And agentic helps with that journey, right? There are, however, also other things like the EUAI act that still requires you to put certain safeguards around all your AI journeys, right? So for us, I think our motto at the moment is, AI is our co-pilot, not an autopilot, right? So there's still a human in the loop for anything that's destructive or that has a chance to affect or have a severe effect on the organization. - It basically takes a lot of the tedious work and task that analysts used to do before and it does it more flexible than a deterministic workflow. 'Cause deterministic workflows are, the second one thing changes, we have an issue with it or it doesn't work, it fails, human needs to look at it. And the agentic part is just far more flexible there. So as Alex said, co-pilot, take care of, prepare everything, give the briefing to the pilot and then take it from there. - But a lot of this is like really common sense. I mean, yeah, I know if you go to, if you go to the crazy land of RSA and they would be a vendor who says that we can remediate, 77% of alerts automatically without humans just putting the cup of coffee down. I mean, we all know it's a lie. And probably they even know it's a lie. So everybody kind of knows it's a lie, but everybody may be able to keep repeating that message that, oh yeah, we're gonna be on the journey to full auto in a year. But what you're describing is, of course, is absolutely logical and coherent and it makes sense. So I'm not, I mean, it's probably a very point, at this point, I'm making. But it is logical whether you're regulated. - Anton called our guests logical and said they make sense. This is a high compliment, Lars and Alex. I don't know if you know that, but this is, - No, no, no. - We are honored Anton, we're honored. - I mean, I really, well, not to say I didn't mean it, I didn't mean it, but it's the actual interesting point is coming, sorry, that use of AI for SOC, AI agents for SOC applies to a startup in the area and the regulated company in Germany. Like, it's common sense, it's actually the commonness of it is the point. That it is what we're describing is how it's done today, how it should be done in a year, probably two years, don't know beyond that. But this is the, if the audience is here in this, no, no, no, they're too safe. No, they're not too safe. That's actually where you should be. It's not being too safe, it's being good. - It really depends on the risk appetite of your organization in the end, right? So as a heavily regulated global financial player, obviously, and a company that has an insurance company is good at risk management, I would hope. You do have certain, we do know the amount of risk we wanna take on board, and that's what we chose that approach, right? So it might be different if you're a two person company that just has an agentic SOC because that's all you have. - I wanna ask specifically about governance, even though you kind of explain de facto governance, and while we question the apply, of course, to agents used to security, but maybe you can spill over a little bit and talk about the governance for agents, AI agents in general, because this is, and then how the apply to SOC, of course. So this is kind of the, how do you govern agents, how do you govern agents in your company? - Yeah, and that's a very good point, right? So as I mentioned before, because we're under quite a bit of regulation, we have strong governance around that, so we review the model. We don't review the models themselves, but we review the use cases. There is what we have in turn, this is an AI board that actually looks at whether that's legal and safe in a way to do, so there's multiple functions involved, data privacy, AI team, data team, security team. So we have all those safeguards, and then once it's gone through that, we put them into production with the appropriate safeguards around them. The other thing for agents is that a lot of people think, I think the joke in the industry is like the S in MCP stands for security, right? So everyone assumes you just put it out there, and it works, managing access and security of these agent to agent communications is a challenge. So we've talked to some other companies that have actually started giving agents employee IDs, which at first it sounds like a very strange idea, but the more you think about it is, well, it's an employee that does things, right? It has certain access rights. It shouldn't have certain access rights. It needs to be monitored, it needs to be managed. And that's how we think about it, right? So what's the identity and access management stuff that we need to put around these cool modern AI agents to actually be able to operate them safely? So the employee ID may send me down in a very interesting path because I've had a few conversations with clients about governing agents applying IAM. And I've noticed they're actually kind of two camps. There's one camp that thinks agents is an employee, and they should be treated as we described in some regards and governed as you would an employee, so like inside of threat risks and all that. And there is also a camp that thinks agents are workloads. And there's absolutely almost nothing new. They're just less deterministic workloads. But then these camps believe that they are the only source of truth, and the other one cannot possibly believe that, which is what struck me as interesting. It's not that there are two positions, but that each believes that they have the common sense and the other cannot possibly think that. - I mean, as an ex-consultant, I have to say, the answer is yes. - Yes, yes, both, both, but then there's a little tiny campus as both, but can't explain what it means. - Well, it means both things, right? So, A, the agent, when it does stuff, it behaves like an agent, right? It does things semi-autonomously. At the same time, it's also workload, right? So, I think what a lot of people forget to look at for AI, you still need to secure the infrastructure. You need to secure the code on top. You need to do all the things you do for workload. And then, at least in my view, something on top. - Yeah, I fully agree. I think a workload is a little bit more, at least for me, a deterministic part of it. So, I see the agent more like a capability like an agent that works on the workload and then it basically goes from there. So, I would actually be the little camp in between that says, well, I think it is both. I would, in general, I would go more for the user, like it's an employee, 'cause that already has a lot of existing guardrails around it that we can just recycle for us. - Yes, that actually, I like that. - I'm getting very like super position vibes from this discussion. And I thought we were done with quantum episodes for the year. So, I wanna move us along and ask a slightly different question around how did you, you know, day-to-day, when you're actually building this stuff out, draw the line between what the agent could do versus what the human had to be in the loop for, because the agent's useless if it can't do anything, and it's also terrifying if you can do everything. So, how did you help your team draw that line? - Yeah, super good question. So, I touched on it before a bit, right? So, right now, the agent is useful even if it cannot execute things, right? So, looking at the agent, we look at it as pretty much the same as a lower tier analyst, right? So, it has the same, more similar skills, but it also has the same restrictions. So, an L1 analyst wouldn't be able to isolate half the company, hopefully, if you set it up right. And for the agent, we take a very similar approach, right? So, let it figure out what it can do. If it's a single machine, we can discuss whether it can isolate that it shouldn't be able to do more than X. At the same time, just like a lower level analyst, if it doesn't know what to do, it should actually put a human in the loop, right? Or the next level analyst in the loop. That being said, in large, touched on this before a bit, to us actually using AI is the second step, right? So, we start with deterministic automation where we can, right? So, if that's a phishing mail reported, that's a very deterministic flow, right? It's phishing, it'll get blocked. We pull it from a real estate mailbox, et cetera, et cetera. AI or the IAA triage agent, especially, or even the IAA mailware analysis agent, right? That comes in when the deterministic automation doesn't have an answer, right? And that's pretty much exactly how our playbooks are built. Like, we try to get to a pretty high percentage of everything being done fully deterministic, because that's great to audit us, that's great to all of that fun part. And in the second, the deterministic is not too short. So, the different phases of an incident in our playbooks, we always ask a question at the end. Like, do we know if this is true positive or false positive? Do we know which systems are involved, which users are involved? And if we're really sure, because the deterministic playbook did everything fine, it just continues as it is. If it's not too short, we can then continue to the other tiers, and that's where I see this all going, as it then goes to the AI tier, basically, and then eventually, if that isn't sure, it goes to the human tier. And I think that way we can scale greatly, because obviously these models need crazy amount of compute anyway, and we can do a lot of deterministic parts that we also want to do. The other really big part, and I think this is where the legal part always comes in, the second we involve investigations against another employee, so another human in that sense, we always have to take another human in the loop anyway, that's just from a legal regulations, as well as internal regulations. But I think it's also a good idea in the sense. - I like that, and I think that the deterministic priority of deterministic is an interesting and a very good point. Like if the use case calls for deterministic automation, you should use deterministic automation. Again, it's not something I've seen people do, because people say, and again, sorry, to detour to Simon Uba Wars back in 2010, when people said, "Oh, we're gonna use ML." And I said, "But you're matching to threaten tell lists, why didn't you wanna use ML?" "Oh, we extrapolate ML and build these things." No, you should just match against the list, and the list is the genuine bad list you wanna match, and why are you doing this strange thing on the side? So to me, the same ghost came here to hunt us, is that people over AI in some areas, where the deterministic answer is a right answer. - I agree. - Yeah. - So lately I had an obsession, and my obsession was kind of like, "It's so ready for AI." And the point about data quality comes up quite a bit. Data availability, data quality, people have been making agents to scrape logs through terminal windows from mainframes. That's a real example. I mentioned it in my blog, and I thought it's idiotic, but it actually happened. So that's why it's bizarre, but it happened. So the data quality and availability and fidelity is kind of a challenge for this. So from your perspective, how was it solved? How are you solving it? Like, I don't know. I don't want to presume you solved it, but how are you solving it? - Exactly, fully agree. So I think that's one of the big parts that we looked into really early on, that it's giggle, it's garbage and garbage out. So you really need to make sure that everything you get in works. And for us, a big part comes down to understanding the data flows. So where does data come from? How do we parse it? How do we change it? And where do we store it? And how long do we store it in our central sim to really make sure we know what the fields mean? Like, surprisingly enough, if you actually get something in your sim, that log already or usually gets parsed multiple times before it ends there. So I use this source IP is actually the source IP, and it's not called something else in there. So that's super, super, super important. And then the other part is we really need to understand, we have to give the models capabilities to understand what data we have and what we don't have. 'Cause that's one of the things where the human analyst back in the day is really good. Like, they've done a lot of these investigations and they saw, well, you know, if I search for this, I don't get a result. Then I know this log doesn't exist. Well, how does the AI agent know that? And especially if we look at the, you know, and I like the reference that you made on one of your earlier podcasts that an AI agent is like a human with a really small window. So it only sees the world from a really, really small part of it. And I think that's exactly how we see it as well. And it only knows this case at the moment. It doesn't know how the case before look maybe we can give him that information, but eventually, you know, context windows get too small for this. So for us, it's super important to give the models all the knowledge that our humans have, like the business knowledge, the context knowledge, and so on. I think that's where we are working a lot also with the Google team to be able to give them that information. I think one really interesting part of this is, for example, if you use a top level domain internally, that is also use externally, then the agent will think somebody, you know, you access something on the internet or you don't. And the equivalent with like API addresses, ranges, host names, all that. Sorry, I had to shudder briefly because I had it happen many years ago when I was in a very bad PM at the Simvender. Oh no. And it was the IP addresses internal use of external IPs. Yup. As instead of RFC 9918s and also domains, that was just like horrifying nightmare for the product developer, right? That you have the same number, same string may appear and they mean absolutely two different things. Exactly. And you know what the interesting thing is, if you have an analyst in your sock, the analyst knows that because the analyst shadowed the other analysts before and eventually from experience got that, well, we aren't there yet. We can't let our LMS at the moment actively shadow and learn stuff. That's at least where not where it is at the moment. It might get there eventually to learn all of that business context information. And then on the data thing, there's one more thing that we're looking into that we think are super interesting. With the ability that we have these agents, we basically build a time machine. Like we can do so much more analysis in such smaller time frame before, which basically will end up and we can do the analysis much, much more detail than before. And if we think of our analysts doing normal analysis, they use our like in depth analysis. They need a lot of data. But they eventually pull data from other systems and so on. So I'm pretty convinced we will not have enough data available in our sims to have all of our agent power actually run and do all that it needs to do. And you know, from our view, the next generation of this is that the agents have the ability to actually pull data from systems. So you don't only let them curie stuff, but you then eventually actively let them pull stuff from the system the same way a real analyst would do. And I can see the industry already going there. We have multiple things where we're trying out and that's super, super neat. Because that will then eventually move us from what we see at the moment, from more triage stage to actual investigation stage, where the analyst actually moves and pulls stuff further. And I'm super excited about this because this where we get closer to what I think is the agentics are actually is. That strikes me as like the audience better take notes because this is like super useful. And you've went through that journey. And I feel like people who are now plopped some kind of agentic AI product on their seam and pushed the button crank the handle and then suddenly, it kind of blew up because of data quality and that that was they were kind of said, oh, sorry, I team your question. No, no, no, no, it's a good discussion. I've got all data ask questions, Anton. I wanted to touch on, how do you know if this is working? And what are you tracking? And what's getting better so far? And what do you expect to get better in the future? Because, you know, Anton, well, we'll get to Anton's question after this. But let's start with that. How do you know if this is working? I think one step before that is what you actually want to achieve with sock automation and using AI in the sock, right? So if we take one step back, well, it depends, right? So it could be say cost, it could be better outcomes, right? But then better depends on what you want to do, right? And I think that's super important to understand that the metrics you measure are the ones you're going to optimize for. I mean, that's true in any case, but especially in this case, it's an important one, right? Yes. And for us, it was really, I was called turbo-charging the sock. Sorry, lots of car references. You're talking to the Germans, right? But that's what we're going to do, right? So I want the sock to be better and faster. The other thing we were discussing this before, right? We both say a lot and it's saying, if you go across the ocean, you end up having radios. And one thing you have on the radio is what's called a squelch button. It's the thing that does kind of the noise suppression. The beautiful thing, once you have AI agents in your sock and they work, you can actually play with that squelch or volume button to make it easier, right? So we can turn up the noise in the system and just let AI filter out what's actually needed, right? So anything that we would now automatically consider a false positive, because there's no fidelity, if we give that to an AI agent, that can actually do somewhere triage on it and verify whether it's actually a false positive or not. And that's something that I'm excited about. But back to measuring, and that's our system main because he's been doing that for a while. Absolutely. So we basically recycled stuff that we had before from the normal automation part. And I think one of my favorite ones is the amount of time we save by automation, and eventually also the AI agent with it, is in a quarter, we save about 68 years of time of our analysts, every single quarter, just by using that. That's amazing. And that's super cool because the sockets of connection can actually do cool things, like other things, then clicking things or curing databases, doing stuff, because we save that time, continue to do other items. The other thing is to convince our team and also our management of how much do we trust this agent? And I think that's like the one big, big question is. And talking to the team itself, the whole triage agent was built by running the same, the same request, the same incidents, beside the agent and the man in analyst. And like, yes, my team is not be a man in analysts, but we basically did the same thing. We just let the agents run parallel to our normal analysts, our normal cases. And then at the end did some statistics on how often do they disagree? How often do they agree? How often do they disagree? And then on the ones that we disagree, we let a more senior person look over it and make sure that happens. And like from the statistics that we look at, we are at the moment, at the level of what our analysts are, if not in some cases, beyond, especially on the like the false negative rate, where the agentika I spotted something at the analyst said, "Ah, this is likely false positive." And it did more analysis in it. So ran more queries, got more data. Because that's usually the time effort of the normal analyst. They don't do it because they're like, "Ah, it's going to be false positive, right?" And it actually turned out not being false positive. So that shows us the confidence that it's actually really neat. And it still runs in parallel to all of our cases at the moment. I'm very curious. If you're measuring the agreement rate between the agent and the analysts, are you also, from time to time, measuring the agreement rate between tier one analysts looking at the same data? We're not yet. I like the idea. What we do is spot checks on quality. So we have to obviously guarantee that the quality works. And we have people that level two, level three, like higher tier analysts that randomly take cases and go through them and make sure, like, is the case actually handled the same quality as we want to handle it? And that's one of the things we want to use also with the agentic agents. That the agentic agents run it, and basically don't treat the agentic case different to a case that a human handled. And let randomly pick the cases, let a senior tier analyst look through them and say, "This work, this didn't work, or this was right, this was wrong." And then fix it from there. And that way we can measure agentic versus human quite nicely. And so is it better. Is the machine, it's all probably like question that's been bubbling up in my head is that it's, I think, getting better. I actually think things are getting better on our side. One of the great examples for this is actually the malware analysis agent that you guys have. Like, having people that can reverse engineer malware, that takes time and you need the people. Yeah. And we get a lot of fishing and a lot of malware delivered. Now, using that agent, every analyst can basically just dump everything into the agent and the agent does the analysis from it. And like one of my favorite example with this is, a lot of the malware have multiple C2 channels. They try to reach out. And if you detonate it in a sandbox, you usually only get the first. Because the sandbox basically checks what network connectivity the thing actually does in dynamic analysis. Well, it turns out the analysis agent dumped the file and figures out all five, six C2s. And it's really surprising if you compare that with the TI knowledge we see, usually the first is already blocked early on. But even like proper TI feeds out there, take longer to get the second, third and fourth because they don't all do that. And that gives us an advantage because we can start blocking that C2 channel going out and preventing people from infecting or downloading the second stage just by using that. And that would not be possible human wise. Like, I could have a whole fleet of people doing malware analysis. Well, now I just plugged it into the analysis agent and the result comes out of it. Back to the volume was both button, right? So we can now do stuff at scale that before wasn't feasible just from a cost or even finding people that can do it perspective, right? I think malware analysis is a great example because it's a low cost of an error, high cost of human labor. My ex and Liz brain is thinking of some framework we should build for use case judgment. And I think the malware analysis is very obvious fit for that. Because even if the agent is just completely wrong, probably human will just do the labor and there's nothing horrible would happen. But if the same agent gives you I am recommendation and automatically implements them, you are shooting a pretty massive torpedo in your business, right? Or something like it can be clearly harmful. But the malware agent analysis mistakes kind of like, okay, whatever. Well, and the cool thing is especially with like, if you look at IOCs, before blocking IOC, we have guardrails in place. So we check, you know, is this a business? You know, are we going to block google.com or not? And that is really, really neat. So, you know, even if the analysis makes a mistake, it really doesn't like, it will get caught in a different layer before it makes business impact. So exactly Anton, as you said, it's a huge advantage for something that doesn't really cost us much. So I'm really, really happy about that. I just want to go back like five minutes to a particular word you used. You used the word squelch. And I have a squelch knob in my airplane for exactly the same thing as you have on your cell, but I used that word not too long ago in a PRD internally. And somebody said, I would get bonus points if I got that word into the product. But today, I've gotten it into the podcast. So Spencer, we are squelching a lot of noise in our product. And I actually think it's a really good example of showing that you can really noise stuff up. I think one, one really cool example with this is looking for, you know, having user accounts or, you know, service accounts and APIs, whatever you want to call them. And checking with them like, if they log into a new system, that itself is probably not malicious, but how cool would it be that every single log and you have an agent, agent, actually look at it and see, do we expect this? Do we have anything else malicious around it? Do I run some queries? And then eventually even theoretically interact with the owner of the MPI. So owner of the service account, like, do you expect this? Yes, no, maybe. And write some explanation back and forth. That stuff that takes a lot of time for an analyst to do. But basically, like, the value is also huge for us if we start detecting, for example, red teams that abuse these APIs with us. So turning up the squelch. I mean, this, I still want to briefly have a more formal answer about, like, I think, getting better because if you have the adoption journey to some of this tech, you know, malware agents, a great example. And I think that beautiful examples, like, malware agent, where the value is obvious, the risk is low, the advantage is dramatic and it's just like, to me, they overshadow the rigorous measuring process. And I know it's kind of a good thing happened to you, values obvious, great. But for T.H. agents, are you doing the measurement of human labor savings? How do you measure quality? Like, the easy examples, like an elephant in the room and it's like, yeah, it's beautiful, it's magical, rainbow colored, it's great. But what about the other stuff, like, T.H. agent? How do you measure the value that it's working? So for the translation, we also, we measured the amount of time that is saved from it. That goes into the, whatever, 68 years every quarter. It's not only that, it's also an automation part of that, but that's a combination of it, which is really neat because we turn up the Squatch button and we get far more alerts. And now, suddenly, we don't have to look at every single one of them because we just check, you know, how confident is the agent that this is actually malicious or not? And then continue from there for stuff that is very obvious. They're obviously is also another advantage to this, right? So if we look at, I think the term used to be bore out, right? So a lot of L1 analysts looking at that, I don't know, the 10th fail log in by Alex because he can't type the password that just, you know, pushes them over the edge and they'd even to do something else. So we haven't been using the triage agent long enough yet to be able to have a definite answer on this. But my hope would also be that it will, you know, ease up some of the burn or bore out that we see in analysts, right? And actually let them do more interesting things. I think that that's actually like, I like that because to me, this is the, well, it's an obvious, again, common sense answer. But I just want to caution you against measuring the time because again, came up in a few discussions before that people say, well, obviously, I think we'll go faster. And I said, you're right. Obviously things will go faster. But guess what else is faster? Just delete in all the alerts. It can be even faster. So if you measure the speed only, like MTTD, you will immediately end in a ditch. And also Tim made a wise point in one of the other podcasts, of course, automation makes things faster. What do you expect? But if you sure, but if they become three times faster and 10 times worse, that doesn't work. Probably lost. Yeah, it doesn't work. Or even if they get 10% faster and three times worse, you still lose. So do me that. I'm happy to do another episode with you on Sock metrics. I still haven't found the perfect one yet. Oh, yes. That would be really fun. So I really hate to do this, especially as we were just getting into what I think is really a slightly different, interesting territory of like, what are the human beings going to do in the future? This is something we talked about on the Minesha episode that I thought was so fascinating. Like, well, now we're unlocking humans to do things that they're actually good at, rather than these painful things that have burned them out or bored them out for decades. Now, I actually have to close this because we're way over time, listeners, and I hope you've hung on for a longer than normal episode. For both of you, you both get our traditional closing questions of one, do you have a tip to help people adopt agentic capabilities in their socks? And two, do you have recommended reading? And that recommended reading could be anything from beginner's introduction to sailing theory to a history of German cars, whatever you want to share with people that you think they might like to read. I have one item and I think that's definitely fixed the data first. I think one of the things that we started looking into as our data quality before, started in working with the Google team and also while working with the Google team, turns out the agent can be as good as it can be if the data quality isn't there, you are going to get into trouble. And I think one of the really important part of this is actually to continuously monitor your data quality. Like, it's nothing that you can do only once or twice or something, but actually build automation around monitoring it because, as I said before, the human analyst will notice that this looks different to what it was yesterday. The digital analyst or whatever the agent agent doesn't yet. I think it might get there, but at the moment it doesn't detect it yet and you might get into exactly this three times worse because the automation doesn't find it or the agent doesn't find it yet. So as for recommending reading, actually, one of them is a bit cheeky, right? So I'd start with your company's annual report because that tells you where the risks are and what the priorities are for the company. It sounds stupid, but that really helps you set everything up beyond that and also gives you the risk appetite normally to a certain extent. And then two more books, one is how to win friends and inference people, a classic. Now don't do everything that's in that book because it'll make you look like a robot, but it helps you. You can go this journey alone, right? So the sock alone can't do this. You need data from other people. You need to attract with other people. You need to both allies in the organization and that's important. And the third one, just because I can, it's will it make the boat go faster? So that's another shared love that we used to do rowing. That's a book by the British Olympic team that started challenging every decision with, will this make the boat go faster otherwise we don't do it? I didn't know that you two also rode I rode throughout high school. There we go. It is a terrible miserable sport. Oh, you can consider it fun. I don't know. It's an elite sport though, right? It's not like I'm trying to make a gesture down. You're talking to Tim that has an airplane. Yeah. Well, fine. No, I mean, some of the greatest highs of my life have been at the end of a sprint. So I, yeah, what a great sport. I agree. I agree. On that note of reminiscing about victories on and losses in rowing boats, Lars and Alex, thank you so much for joining us today. Thanks for having us. And now we are at time. Thank you very much for listening and of course for subscribing. Please subscribe so you can get new episodes piping hot. And if you love our content, please drop us a review on your platform of choice. You can find this podcast on YouTube, Apple podcasts, Spotify or whatever you get your podcasts. Also, you can find us on our website cloud.withgoogle.com/cloudsecurity/podcast. You can argue with us on the Google Cloud Security community site, googlecloudcommunity.com. You can follow us on xx.com/cloudsecpodcast, treat us, email us, argue with us. And if we like or hate what we hear, we can invite you to the next episode. See you in the next episode of the Cloud Security Podcast by Google Cloud Security Podcast by Google.
Key Points:
Allianz, a heavily regulated global insurance company, is adopting AI agents in its security operations (SecOps) as a "co-pilot" to enhance efficiency while maintaining human oversight for critical decisions.
The company views its AI adoption journey as progressing toward "agentic SecOps," currently at an intermediate level where AI handles tedious tasks and deterministic workflows, with humans involved for complex or high-risk actions.
Governance of AI agents involves treating them similarly to employees, with strict identity and access management, legal reviews, and safeguards to ensure compliance with diverse global regulations.
A key strategy is prioritizing deterministic automation for clear-cut tasks and using AI agents only when deterministic methods are insufficient, ensuring reliability and auditability.
High-quality, well-understood data is critical for effective AI agent performance, requiring careful management of data flows, parsing, and storage within the security infrastructure.
Summary:
In this podcast episode, hosts from Google Cloud Security interview Lars and Alex from Allianz about their company's adoption of AI agents in security operations. Allianz, as a heavily regulated global insurance firm, emphasizes a cautious, governed approach, using AI as a "co-pilot" rather than an autopilot. The AI agents assist with tasks like triage and malware analysis, handling routine work to free up human analysts, who remain in the loop for high-risk decisions, especially those involving employee investigations or potential severe impacts. The journey toward "agentic SecOps" is ongoing, with the company currently at an intermediate stage, leveraging both deterministic automation and AI where flexibility is needed. Governance is strict, involving reviews by an internal AI board and managing agents with principles similar to employee identity and access controls. A significant point is prioritizing deterministic solutions for clear scenarios and relying on AI only when necessary. The discussion also highlights the critical importance of data quality and understanding data pipelines to ensure AI agents operate effectively, underscoring that successful AI integration in SecOps requires a balanced, pragmatic approach tailored to organizational risk and regulatory demands.
FAQs
The Cloud Security Podcast by Google is a show hosted by Tim Peacock and Anton Chuvakin, focusing on cloud security topics. You can subscribe wherever you get your podcasts or visit cloud.withgoogle.com/cloudsecurity/podcast.
An agentic SOC uses AI agents to automate and enhance security operations, moving from manual processes to more autonomous, flexible workflows. It's often described as having levels, similar to autonomous vehicles, with the goal of improving detection and response efficiency.
Allianz adopts a cautious, governance-driven approach, treating AI as a co-pilot rather than an autopilot. They involve human oversight for critical actions, comply with regulations like the EU AI Act, and use AI to handle tedious tasks while maintaining control.
Governance includes reviewing AI use cases through boards involving privacy, security, and legal teams, applying identity and access management (IAM) controls, and monitoring agents like employees. Safeguards ensure agents operate safely and within regulatory boundaries.
Organizations should prioritize deterministic automation for clear, repeatable tasks (e.g., blocking phishing emails) and reserve AI for scenarios where deterministic methods fall short. This hybrid approach improves scalability and auditability while leveraging AI's flexibility.
Data quality is critical because AI agents rely on accurate, well-parsed data to function effectively. Poor data leads to unreliable outcomes, so understanding data flows, parsing, and storage ensures agents can perform investigations correctly and avoid errors.
Chat with AI
Ask up to 3 questions based on this transcript.
No messages yet. Ask your first question about the episode.