Open Policy Agent with Torin Sandall

Torin Sandall & Adam Hawkins discuss shifting left on security with Open Policy Agent (OPA), conftest, and rego. Plus Small Batches housekeeping.

[00:00:00] Hello and welcome. I'm your host Adam Hawkins. In each episode, I present a small batch with theory and practices behind building a high velocity software organization. Topics include dev ops, lean, software architecture, continuous delivery and conversations with industry leaders. Now let's begin today's episode.

[00:00:26] Hello there. Let's start this episode with some housekeeping and some updates on the podcast. First off, you've likely noticed that there are more episodes than usual. Remember when I said I would start doing interviews on the show? Well, it turned out that that really has gotten much better than I ever expected.

[00:00:46] Now, as a result, I have recorded many more interviews than expected. Now I could keep releasing interviews on the normal biweekly schedule, but if I did that, then it would be exclusively interviews well into 2021. Admittedly, I did think of that and just thought of allowing the podcast to run on cruise control for a bit, but I decided against it. Mainly because there are more topics I'd like to discuss and more solo episodes I'd like to do.

[00:01:18] Sticking to the biweekly schedule, ties my hands a bit with regard to the high level narrative thread on the podcast. That brings me to the next point about pacing on the podcast. Doing interviews brings different perspectives and new context of things previously discussed on the podcast.

[00:01:35] Here's an example. I did an episode a while back on the four types of work that introduced a flow frameowrk. And then a few months later, Mik Kersten came on the show to discuss the full framework and really build upon the previous episode. I think that's wonderful and something special to small batches that I haven't heard on other podcasts.

[00:01:55] Now, my problem is trying to keep these things closer together. So you have a listener, don't lose all that context between these types of episodes. So like right now, there are so many interviews already recorded and an equal number currently scheduled that pushes the timelines out, which stretches the context, which makes it harder to plan and ultimately more challenging to me to even produce this podcast.

[00:02:21] So for the time being, I will release episodes more frequently until the interview backlog has burned down a bit. After that, it will be much easier for me to maintain and drive the narrative thread on the podcast. So enjoy the assortment of guests coming your way. I assure you there are many more great ones in the pipeline.

[00:02:42] All right. So after all that, we will return to the regular bi-weekly schedule.

[00:02:47] I think that's really a good pace for the podcast. I'm toying with the idea of doing alternating solo in interview episodes on a certain topic. So for example, a month might be the full framework month with one solo episode on the topic from myself and then an interview to discuss it more in depth.

[00:03:07] I also toying with the idea of doing a week-long series on a particular topic. I've seen this format on other podcasts, and I really like it because it allows you to go much deeper into the topic. So, if you think back to the previous episodes on this whole 0.1 factor app, those I think would have worked really well in that format. In fact, I had already written all of them before according the first one. Then maybe we could end the week with a guest interview related to that topic. This has already kind of happened by having Joe Kuttner on the show to discuss the twelve factor app. But let's see though, I'm always thinking about the format and the best way to share software delivery education with all of you.

[00:03:49] Just a quick recap of everything so far before I move on to the announcement. There are a lot of interviews to burn down before we can get back to the regular scheduled programming. So expect more frequent interview episodes in the short term, before we get back to a mix of solo and interview episodes.

[00:04:06] All right. It's announcement time. And I'm stoked about this one. I'm doing a five-part solo episode series called the salt side.Chronic. This series covers my time at salt side, where we completed a ground up rewrite the entire product, just to launch a mobile app. There is so much I could say about salt side on this podcast that I couldn't leave this story on the table because it's a story about software architecture, technical debt, business agility, and success in the market.

[00:04:37] All the things that I try to tie it together on small batches. So it will be a five-part series with a new episode Monday through Friday. I'll cover the high-level story, the business and tech factors that led up to the rewrite, splitting a monolith into microservices and their reflection on that entire effort through the lens of what I know now.

[00:04:58] If you know me, then you've probably heard me talk about salt side. If you haven't yet, well, then you'll get the full story with this series. I'm excited to share this story with all of you. And if you're listening to this and you were part of the salt side Chronicles, and I know that some of you are then consider it my tribute to what we were able to accomplish together. The release date is still TBD, but I assure you it will be in 2020. I think it will be a great way to end the year.

[00:05:26] All right. So now that the housekeeping is out of the way, time to introduce today's guest today, I'm speaking with Torin Sandall. Torin is a maintainer on the open policy agent project. Let me read off a bit from the official project description.

[00:05:42] The open policy agent or OPA pronounced OPA is an open source general purpose policy engine that unifies policy enforcement across the stack. OPA provides a high level declarative language that lets you specify a policy as code and simple API is to offload policy decision-making for software. You can use OPA to enforce policies and microservices, Kubernetes CICB pipelines, API gateways, and more.

[00:06:11] All right. Well, I found Torin when I was teaching myself Rego, Rego is a language behind OPA. Now to make a long story short. I fell in love with Rego and the associated tool called contest because I could write pre-committed hooks and preflight checks for Kubernetes manifests as well as any other data structure I could think of.

[00:06:32] I invited Torin on the show to discuss the origin of the project, why even write a custom language in the first place and how OPA fits into dev sec ops. Now and finally, I give you my conversation with Torin Sandall.

[00:06:50] Adam Hawkins: Torin. Welcome to the show. So I've already introduced you in my own words. Why don't you introduce yourself in your own?

[00:06:57] Torin Sandall: Yeah, thanks for having me on the show. So my name is Torin. I am the vice president of open source at Styra and I'm also one of the co-creators of open policy agent or OPA as we like to call it.

[00:07:07] Torin Sandall: I spend most of my time kind of leading design and development around open policy agent. And I love kind of engaging with community and chatting with users and talking about how people are using OPA in the field.

[00:07:17] Adam Hawkins: Cool. So how long have you been working on LPA and what was sort of the Genesis for this project? Is now I know it's part of incubating status and the cloud native computing foundation? Seems like it's had a good run.

[00:07:32] Torin Sandall: Yeah. Yeah, yeah, yeah. Yeah, so, so the policy agent is as part of the cloud native computing foundation, we donated it in, in 2018. So we started the project at Syrah, the company that I worked for back in like early 2016. So Syrah was sort of, I don't go to too much detail, but that was started to essentially like rethink policy and authorization in the enterprise.

[00:07:51] Torin Sandall: Open policy agent is sort of a part of that overall vision. So let's start, we started the project almost five years ago now. And you know, over the last five years, the project grew, I mean, doctors showed up, other contributors showed up and eventually we made the decision to donate it to cloud native computing foundation CNCF.

[00:08:07] Torin Sandall: So that was in 2018 and then fast forward a year, 2019, it was promoted to the incubating stage in, within CNCF and, you know, going forward we're, we're looking forward to, you know, we're looking towards like graduation and moving ahead.

[00:08:20] Adam Hawkins: So for the listeners who are unfamiliar with problem domain of policy and authorization, can you kind of give the elevator pitch for the use cases for OPA and the problem you are trying to solve?

[00:08:30] Torin Sandall: So OPA it basically lets you control who can do what at any layer of the stack. So whether you're talking about ABI authorization inside of a microservice or an application, or you're talking about putting down like safeguards and guardrails over your Kubernetes cluster, or you're talking about, you know, enforcing checks and even linting rules and a CIC pipeline open policy agent gives you kind of one unified way of expressing policy or authorization across the stack.

[00:08:57] Torin Sandall: Typically, you know, in the past, before something like OPA or open box agent, a lot of these policies would just get kind of like written down on wikis or they'd be tracked in spreadsheets. It'd be like, you know, spreadsheet says, these machines are in this category of network and they should only talk to sheens. And in that category of networking.

[00:09:11] Torin Sandall: It was up to Alice or whoever to, you know, every couple of months look at that spreadsheet and double check the things were cool and, and kind of go from there. Right. And, and that, that, that's the approach that a lot of organizations. Obviously, you know, you can start to imagine ways where that falls over, right? It doesn't scale very well. You have no guarantee that things are being enforced. It's just not a good way to operate. Right? Operating based on spreadsheets and tribal knowledge is risky. And so open policy agent, what it does, it takes those same policies that people might write down in PDFs or wikis or whatever.

[00:09:39] Torin Sandall: And it allows you to write them in a high level of declared a language that both humans and machines can understand. And then it gives you a runtime for having those enforced inside the system.

[00:09:48] Adam Hawkins: You mentioned you could use OPA for authorization between microservices or enforce things like in a Kubernetes cluster, or even as part of linting and your CI pipeline.

[00:09:58] Adam Hawkins: So can you explain briefly about a high level architecture for how that would work with, say authorization between two microservices? So if I want to do something like this, do I need to have a running OPA server or service? I mean, what does this look like in practice for somebody who wants to use these tools?

[00:10:17] Torin Sandall: Yeah. So, so depending on the use case, depending on where the kinds of policies are enforcing and where those policies need to be in forest, it's a little bit different, but kind of the way that we think about OPA is that it's essentially a host local cache for policy decision-making.

[00:10:30] Torin Sandall: So ideally OPA is running next to the piece of software that it's policy enabling, or that policy queries are coming from.

[00:10:39] Torin Sandall: So, if you're talking about API authorization in a microservice environment, authorization queries, or decisions need to be happening at that microservice level on the, on the server where the microservice is running, right? You don't want to have the microservice, let's say, call out across the network every time it has to make a policy credit, because if you do that, things are gonna get slow or you're going to suffer downtime and so on, right? Like if you think about, you know, a large application with many microservices.

[00:11:02] Torin Sandall: The processing of an individual request to the application may have to flow through a bunch of different microservices. And if in each, every single Hawk, you're calling over the network, you're going to kill yourself. So what you want is for that decision-making to kind of be local. So what that typically looks like is people take OPA and they will just deploy it on the same server where the microservice is running. So if you're in Kubernetes, you know, you'd use, what's called like the sidecar pattern, which is basically like you have your microservice container running and what's called a pod in Kubernetes. And then OPA is also running in that same pod as a separate container, they're sort of like joined together, but you're not limited to just Kubernetes with OPA you can also run it in bare metal environments or in just virtualization like ECQ instances are open is just running as another process on the host and you can also embed it as a library.

[00:11:44] Torin Sandall: It's ready to go. So it's, it's intended to be kind of pretty flexible. You can kind of prop it in, but ideally it's just running as close to the software as possible. Okay. I got it. That makes sense. So what is the most common use case you've seen for OPA?

[00:11:58] Torin Sandall: So they're kind of like, I, it's hard to say at this point, we, we released a survey back in April. That was like a user survey that we did where we had, like, I don't know, it was like 200 respondents from a lot of organizations. And it was actually interesting because it showed that like a lot of the respondents were using OPA across multiple use cases. And so, and it was almost, it was almost evenly split between a couple of them.

[00:12:15] Torin Sandall: So the two big use cases that we see people running OPA for are API authorization in microservice environments. Right. You know, service A can talk to service B and not serve C or internal users can access these APIs than all those other API and so that, that kind of stuff.

[00:12:30] Torin Sandall: The other broad broad category of use cases is what I call like config validation, and specifically configuration validation and Kubernetes enviroments. When you're talking about config validation, Kubernetes, it's often about admission control, right? So basically taking OBA and putting it on the cloud on a Kubernetes cluster and then protecting access to sensitive or to resources in the cluster compute network and storage resources in the cluster, like load balancers and all kinds of different resources that make up your, your communities enviroments.

[00:12:57] Adam Hawkins: Okay, so let's take a step back for a minute. And so I'm not sure that everybody who's listening may be familiar with the concept of a mission controllers and Kubernetes. So the idea here is that, you know, inside Kubernetes, you're going to have all these manifests that say, like these containers, these resources, memory, CPU, like security permissions, like, can you run as root, you know, special flags, all that stuff.

[00:13:17] Adam Hawkins: The idea with Misha controller is that you can use an admission controller to validate if a certain resource should be allowed on the cluster. So you can write rules that say, don't allow images from this registry or require that all of the containers in a deployment have a certain like security flag or the resources like the CPU and memory are set like this, or not like that.

[00:13:41] Adam Hawkins: So, if I understand correctly, you can use OPA in combination with an admission controller to make these decisions.

[00:13:49] Torin Sandall: Yeah. Yeah, exactly. Yeah. I mean like you described Kubernetes allows developers to provision or control through desired state, the compute network and storage resources that make up their applications. So what containers are going to run, where they're going to run, how they're going to be exposed on the internet, how they can talk to each other, what storage resources they're going to use and so on. Right. All of that is specified through, you know, what they call desired state and Kubernetes, right? It's just basically configuration objects, right?

[00:14:13] Torin Sandall: Let's say, run this container. What at mission control is, is it sort of like the last stage in the API server, which is like the central point through which all kind of Kubernetes API requests come in through which you can enforce any kind of validation or any kind of guard rails or security policies essentially on those, on those resources.

[00:14:30] Torin Sandall: So like after the request has been authenticated and some really poor screened high level authorizations been applied, the admission controllers kick in, and that's where all kinds of things like quota, image policies and all other kinds of admission control rules are enforced and that's where OPA hooks in to Kubernetes.

[00:14:47] Adam Hawkins: Okay. So then you mentioned the second use case, which was, I think, as you described it, static config validation. So this is how I came to OPA. I think a common problem that people who work in Kubernetes, you know, there's something that I have encountered for sure, is that once you get to a point where your manifests are sufficiently complex, there's semantic issues that you can introduce that will cause problems and there's just validation issues that you can introduce that might not be caught by something like cube CTL at that touch validate, or even cube Val. So like the idea is that there was like an issue where I had written a manifest for Kubernetes deployment and specified the match labels, but then those matched labels didn't match the actual labels and the deployment now cube CTL will gladly apply that, but the API server will throw an error.

[00:15:33] Adam Hawkins: So then after a certain amount of time, sort of to think like, Hey, there's gotta be a better way to actually do some sort of higher level validation on these things before we actually move to the deployment pipeline and throw them at the API. That scenario also expands out to other kinds of config validation, like, say for example, it could be functional stuff. It could be, you know, as you mentioned policies like, Hey, is a certain security flags set. Like, are you allowed to use this or that, but it applies way broader than just Kubernetes. Like, say for example, let's say that you're working in an organization and you're publishing NPM packages and you have internal packages.

[00:16:08] Adam Hawkins: Then maybe you want to test that inside that package dot Jason, the private flag is always sent to true so they can never be published to the public MPM repository. So like there's all kinds of ways that static config validation can enter the picture.

[00:16:22] Adam Hawkins: So, this is actually my favorite part of about the open policy agent project, which is the, like the, I guess it's the hand rolled our custom made language called rego.

[00:16:31] Adam Hawkins: So can you tell us a little bit about rego and what you can do with it? And I'm also curious why make a new language, like why not just reuse something?

[00:16:40] Torin Sandall: You know, open policy agent Rego, they kind of go hand in hand. They were created at the same time to serve the same purpose. Right? Rego is basically a high-level declarative language that borrows its semantics from some like old logic based programming systems.

[00:16:54] Torin Sandall: And it's adapted those, those language semantics to deal with basically like arbitrary, deeply nested structured data. So AKA Jason or yaml, right. And the reason why we built Rego for open policy agent was that if you look at the world today, right, like every API that a developer will publish day is based on deeply nested hierarchal structured data, right? Like every API service Jason or some variant, something similar to that, all configuration files are kind of following that, that format these days.

[00:17:22] Torin Sandall: All these systems are coming out that are based on this like desire, same concept, like Kubernetes, what you need from a policy system today is a language that allows you to write down rules that validate all kinds of different things. Like just basic semantic validation that you alluded to, but are there other kinds of settings that affect, you know, your uptime or your cost or your security or your best practices or whatever, right? You need a way to write those kinds of rules down against deeply nested hierarchical structure. And so when we started to open policy agent, you know, looking around there just weren't really any viable alternatives at the time.

[00:17:54] Torin Sandall: If you talk to people that build DSLs all the time, like one of the things they'll tell our programming languages to say, don't do it. It's a lot of work, right? You got to create a tool chain and you've got to optimize it. There's a whole bunch of work that goes into that process. And so the first thing that you would just avoid it, like avoid that problem, use something that already exists.

[00:18:10] Torin Sandall: And so we tried that at the beginning. I say we, it was actually, it was Tim who was one of the co-creators. He tried using SQL to write these kinds of policies. And at the time we were looking at using SQL to kind of save these kinds of things, like for example, you know, no hosts should be exposed on the public internet if it's, if it's exposing port 80 or Telnet or whatever you want. And we try to do that using a, using a cloud providerI APIs. And it was, it was, it was a nightmare because what you ended up having to do was take this data coming from this cloud provider and flatten it out into that, into that flat kind of like relational model. Yeah, I think at the time there was something like 250, like synthetic tables that, that got introduced that you look, look at it as a SQL offer.

[00:18:48] Torin Sandall: I have no idea what they meant. And then when you found, when you were writing, the SQL was that you basically had to reconstruct all these joins just to get back to this hierarchical data structure. And so when you really want to just be able to do is kind of like dot down through the Jason that comes back from the API and say you know, this thing is exposing port 80 and whatever. Right.

[00:19:04] Torin Sandall: So we tried other things, you know, we tried, we tried SQL, we tried some other language and it just wasn't working. And so we figured, okay, what we need is something new. It's, it's really purpose built for this, this kind of problem. And so that's why we printed that language.

[00:19:16] Torin Sandall: And so that's why I say like the two are kind of very tightly joined together. Open policy agent is basically a runtime for Rego, right? So runtime, they get those policies in forest and it's, it's designed with a bunch of ideas in mind to make it a good fit for policy enforcement. Right? If it's designed runners, those little cache, it's designed to be as lightweight as possible.

[00:19:34] Torin Sandall: So you can drop it in to things like a CIC pipeline. It's designed to be performance. It's a whole bunch of different design considerations that influence.

[00:19:42] Adam Hawkins: Yeah, for sure. So I can echo that with my own experiences using rego. So for the listeners, I came to rego trying to replace them like hand-rolled Kubernetes validation. I wrote no JS kind of a recurring problem that I've had, but anyway, came to rego on thought like, Hey, let's give this a try. So one thing I want to just clarify is that rego is a part of the open policy agent project. You can use Rego and kind of that whole tool chain independently of this concept of the mission controllers and integration with other things and you know, all of that.

[00:20:16] Adam Hawkins: So you can do, you know, you can do both, but you don't have to. So, you know, torn, you mentioned speed, and I can tell you that it's definitely fast compared to some of the other ways that you might validate stuff, using Rego is really fast. The kind of thing, is it upon initial inspection of the language, it definitely looks weird, but once you start to like, once start to understand it, it's like, yeah. Okay. Like one of the things that kind of threw me off in the beginning was, Hey, like, there's these underscores? Like, what are those? But now it's like, okay. Yeah, that's just a place holder for some iteration, verbal, but I don't care about like, just, Hey, just loop over this thing. I don't care. Just do it for me.

[00:20:51] Adam Hawkins: And then write assertions that, you know, you can write deny rules, pass rules, you know, however you want to think about it. So one thing I'm curious on is, you know, you have a tool like Rego now where you can write pretty much any kind of validation rule that you could come up with on structured data in one file, or even combined structured data across multiple files that might reference like structures and other files.

[00:21:13] Adam Hawkins: You can express all kinds of rules on Rego. How does it relate to something like Jason schema? If you're using Rego to do some stuff, you have the idea of Jason schema. Jason scheme only applies to Jason where rego you can use for any structured data. Is there an overlap or point where you would use one or not the other?

[00:21:33] Torin Sandall: So Jason schema, it lets you define the structure of the data that you're working with. So it's really good at letting you say, like, this field can be a string or this field must exist and it must be a number that's greater than seven or something like that. I think you can have Jason's game that refer to other schemas and so on.

[00:21:50] Torin Sandall: But fundamentally what it's letting you do is basically put some assertions or constraints at the individual field level, in a JSON document. Right. And so a schema in any other system. Where that starts to fall over is when you need to express things that are more like relational, right? So if you wanted to say that this field must be set, if this other field isn't set or this field must be greater than seven, if this other field is in debt, that's where it starts to get a little bit tricky. And that's where you need something a little bit more expressive. Right? So the way that I would look at it is that Jason schema is almost like a language, right? It's a way of putting down some constraint on some data, but it's not as expressive as like a full-blown programming. But there's a place for that now.

[00:22:29] Torin Sandall: So Rego lives in between that and that kind of continuum of expressiveness with Jason's chemo all the way over on one side and a full blown programming language over on the, on the other side.

[00:22:39] Adam Hawkins: Yeah. That's a good explanation of it because that's how I came into writing my own hand-rolled validations because I had been using Jason schema. You know, you could express presence or absence, but you can't do anything on top of logic. Right. Like if you have this field and you need to have this, or maybe this field, this value has to match a value in this other object or any kind of conditional stuff. Like, so once you have logic and you escaped the capacity of Jason schema, and then you need something else.

[00:23:05] Adam Hawkins: So beyond just Kubernetes validation, how have you seen rego use?

[00:23:12] Torin Sandall: Yes. So we're, we're, we're always surprised by the use cases that people come up with for Rego and OPA. I like to use the term config validation to apply to Kubernetes, because I think that, you know, a lot of the people that start using OPA and Rego for Kubernetes, quickly graduate to using it first, like something else in their infrastructure or in their platform.

[00:23:30] Torin Sandall: So we see lots of folks using it for systems like Terraform. We see people using it for things like cloud formation. You know, any time you have some desired state that describes your platform, that you run your workloads on Rego is a good fit for writing down safeguards and putting in kind of guardrails.

[00:23:46] Torin Sandall: So thoughts of interest around using it for just different kinds of cloud provider resources, right? That are today often representative and yaml or Jason.

[00:23:55] Adam Hawkins: Yeah. Okay. So now that you mentioned it, like I kind of went through the same progression myself. Like once I started to gain some competence with rego, I thought like, Hey, I can use this for so much more stuff. And that's what got me so excited about it. Just like personally, I'm really keen on linting and any kind of static analysis that we can add. Ideally like in the pre-commit phase or somewhere in the whole like pipeline, if we can have fast static analysis that guarantees a certain river free of certain regressions as, that's wonderful.

[00:24:22] Adam Hawkins: And as you said, we write a lot of these configuration files that could be Terraform. They could be yaml, they could be Jason. I mean, think of how many dot Jason files or yaml files exist at the root of product directories these days. Right?

[00:24:35] Adam Hawkins: So learning Rego for me was sort of an on-ramp to, Hey, I can use this one tool to solve all these different problems. Put that in a it's fast as hell. Like compared to all the other stuff. So there's no worry about, Hey, this is going to be too slow to put at this stage. Right? Like I can add this to the pre-commit hook, no fear, and then put it in the pipeline.

[00:24:56] Adam Hawkins: So to bring the conversation back to Jason schema, a past company I used to work for, we had designed the whole Jason API and we had used a Jason schema as a way to set the contract between the providers of the API and the consumers of the API.

[00:25:12] Adam Hawkins: And what we did was a way to validate incoming requests was we just validate incoming requests using the Jason schema. And then once we had generated the response, use the, you know, the outgoing contract to verify the response, and this would give us like easy way to maintain contract correctness, but that was with Jason schema.

[00:25:28] Adam Hawkins: Now, if I get that code wasn't written and go, but if it was, I could potentially use rego as a way to just quickly verify the incoming, outgoing response contracts. So like, as you say, once you start to grok these tools, I think there's just expanding use for them. Is that something that you have seen in your own experience or working with clients for your company?

[00:25:49] Torin Sandall: Yeah, definitely. I think the funniest one that I ever remember was like one, one user has given some talks about, about using OPA for, for API authorization and other bread and butter use cases. Wrote a blog post about how you could like use OPA as a rule engine for an RPG game. So like, I don't know, this person like was really into RPGs like a long time ago and kind of used it. So he like codified all the rules about, you know, what happens when you get hit by like a bronze sword versus like an iron sword and stuff like that.

[00:26:17] Torin Sandall: So you can take it to like a, to a crazy extreme, but, but generally this is like the, kind of the, the mic, the story that we see play out in organizations that adopt OPA and rego. They come across it because it solves some very concrete, particular problem that they have. Right. They need to stop you know, conflicting ingress resources from being instantiated inside of Kubernetes. Right. And then, so they, they find that tutorial, they run it. They're like, cool, this, this I'm I'm good.

[00:26:43] Torin Sandall: Then they come back a week later and there they realize, oh, Hey, the developers at my company can create network policies in Kubernetes. And allow egress traffic to any IP address in the world. Right? Well, I'm in cyber security at a large financial institution, and I'm a little bit worried about my data being exfiltrated from my network, right? So I'm going to use open to put some guard rails in place, over network configs in Kubernetes. And so it kind of slowly expands over time. And, you know, we see folks that are just extremely productive with them.

[00:27:08] Torin Sandall: I've talked to folks that haven't developed in like 20 years that just think about things like security and policy at very high levels. And when they get down to it and they start using rEGO, they find themselves to be chilling productive with it.

[00:27:17] Torin Sandall: So I think that the fact that it is domain agnostic and not tied to any project or domain specific data model makes it extremely, extremely powerful. And it's something that you can, once you kind of grok it, you just find all kinds of places where you can use it.

[00:27:30] Adam Hawkins: Yeah, I think that speaks to high quality architecture of the project overall. Is that there are discrete bits of this tool that you can use in discrete problems. You don't have to buy into the whole thing, right? Like you can use Rego or you can use OPA and just use the regular validation part to write policies and even write tests for the policy.

[00:27:48] Adam Hawkins: So, got to give a shout out to anything designed with TDD design, like in mind that you can just write the policy or write to have a test for it. Like that is such a big thing that honestly, it's surprising to me. That's kind of just skipped in some places. So congratulations, and thank you so much for that.

[00:28:03] Adam Hawkins: For the listener. There's another project. I'm not sure if it's part of like the whole OPA kind of project, but cough test. So cough test is a tool around Rego that allows you to write Rego policies outside the scope of OPA it's kind of designed to do static config validation. And the idea with contest is that you can create your policies, commit them to your repo, and then run some tests against the data in your repo against these policies.

[00:28:27] Adam Hawkins: So is a contest part of the whole OPA under that whole umbrella? Or is it separate? Because I know it's maintained by, was it instrumental or some, some different guys right?

[00:28:37] Torin Sandall: Yeah. Yeah. Yeah. So cough test was a project that fellow called Gareth restaurant created, and it was essentially a way of taking Rego, you know, open policy, agent policies and rego policies and having them run against arbitrary kind of config files regardless of the format.

[00:28:52] Torin Sandall: Right. So it has a bunch of parsers inside. For, you know, not just yaml and Jason, which opened supports out of the box, but like Docker files and files and you know, varnish, config files and stuff like that. Right? Like just arbitrary kind of config file. So it takes those config files. It converts them into, into essentially Jason, it's not quite Jason, but Jason internally. And then it loads that into open runs your policies against it. So it provides, provided like a route runner initially provided like there's really nice kind of developer user experience for somebody who wants to write policy over, you can pick battles. And so I think that was, that was about a year and a half ago now.

[00:29:25] Torin Sandall: And so over the last year and a half, but the project we're chatting a lot and we were watching kind of the project grow and it was really cool to see that adoption happened pretty much organically. Like I think, you know, there are some talks about it, but it just kind of kept getting momentum and traction.

[00:29:38] Torin Sandall: And so earlier this year, the open containers and the uncompressed materials got together, when we were chatting, we figured that it made sense for contest to become part of the overall policy agent project.

[00:29:48] Torin Sandall: You know, a couple of months ago, there were some press releases about it, but basically the cough test project is now part of OPA proper. It's hosted under open policy agent on GitHub. And so it's kind of like a first class OPA sub project along with a gatekeeper. Totally part of open now. And it's nice because it means that we'll have like better kind of communication and collaboration between the do two different kind of groups of developers.

[00:30:10] Torin Sandall: And you know, we also plan to sort of take some of the ideas that were kind of essentially incubated inside of cough test. And to bring them into like over proper, you know, in the, in the near future.

[00:30:19] Adam Hawkins: Oh yeah, like what?

[00:30:20] Torin Sandall: So there are a couple things that I already mentioned, one of them, which is just having like broader support for different file formats, right?

[00:30:26] Torin Sandall: The fact that you can just throw arbitrary kind of structured files at it and get out of a policy decision is, is super useful from a developer user experience. Point of view. So bringing in support for formats other than Jason and yaml or something.

[00:30:39] Torin Sandall: Another thing that it brings in or that it kind of introduced was this notion of just making it really easy to test data on the command line, right? So like OPA has a test sub command, but that test sub command is for testing your policies. It's for running your unit tests for your policies. Would comp test is, is basically a framework or way of testing, data of testing config files. Kind of have probably looked like another sub command and OPA eventually, but having some way of easily validating data files using policy was one thing and then also just the ability to kind of push and pull and share policies was something that Comp test introduced as the ability to take a an over bundle and, and push it up to like a container registry even have it pulled from there is something that's also interesting.

[00:31:22] Torin Sandall: Yeah, there are a bunch of interesting ideas that were really nicely kind of packaged together in contest.

[00:31:27] Adam Hawkins: Yeah, that's true. So for the listener, when Warren mentioned configuration validation on the command line, say if you're using something like circle CID or code fresh, or any of these like hosted CIS systems, but probably have some CLI you can download it. And one of the commands will be land or validate that will check your pipeline definition is valid.

[00:31:44] Adam Hawkins: So cough test gives you a way to do that against arbitrary, structured data in anywhere, right on your file system. And what he also mentioned was something really cool, the capacity to bundle and share policies.

[00:31:56] Adam Hawkins: So for example, you could hypothetically, create a policy for all of the engineers or all of the services in your, your organization whatever. As I hear is security policy, things should validate against a security policy or some sort of like Kubernetes manifest validation. So it gives you a way to push those policies, like publish them in a way that they can be consumed. So that's good news about comp test I cannot understate how much I like this thing now.

[00:32:22] Adam Hawkins: Like I'm actually excited to like, find ways that I can use contests to just improve different parts of the overall workflow that had been covered by, you know, like a mismatch of different tools.

[00:32:32] Adam Hawkins: I like to keep the amount of tools I use very minimal and make sure that they're really good. Like if I can use them in different places and they work well in different places, that's great.

[00:32:41] Adam Hawkins: Yeah, it's really nice too, that you can use it for things like Terraform or like the Docker files, which brings me back to the other aspect of this and how it connects to the topics of continuous delivery and dev ops that now we have the idea of like DevSecOps. And security through automation and OPA and Rego and contest.

[00:33:01] Adam Hawkins: They give you a structure to say, Hey, I know that I have these sort of security concerns, potentially like, as you mentioned, no app should open up port 22 or open up port 80 or something like that. Instead of relying on Alice or humans to go in and verify all these things, you can leverage automation tools to bake that into your process, either like a visitor cluster or in your CIP process. It's like static validation on config to meet these high level objectives.

[00:33:31] Adam Hawkins: So when the project started, what was the perspective on the initial use cases for some of this? Was it in the vein of automation or security or, you know, how did it look at the beggining?

[00:33:42] Torin Sandall: That's a, that's a good question. I think we had a lot of different ideas in mind about how these things could be applied, how technology could be applied. I mean, I think policy is like sufficiently general to cover just about anything. Right.

[00:33:53] Torin Sandall: And I, I mean, I think that in the end of the day, like stopping bad things from happening in the system, that's really what it's about. It's a lot of the time, right? It's like I, a security feature is really bad, but you know, runaway resource utilization, it brings down the website and you're no longer selling, you know, dog food or whatever is also really bad. Right.

[00:34:11] Torin Sandall: I personally, I'm interested in security. But I also think that the more general problem of how you manage a distributed systems at scale reliably and effectively is the interesting problem. And that's kind of where I come in with that. And I think a lot of that comes back to is what you just said, which is not having 50 different ways of saying the same thing across the 15 different systems you have to manage.

[00:34:32] Torin Sandall: But instead of having one way of writing down, you know, the rules that control who can do what essentially across those 50 different systems, so It was motivated out of security, obviously, but there were other drivers. We also looked at other kinds of use cases, not just around validation, but also essentially a mutation, right?

[00:34:48] Torin Sandall: So like for example, workload placement you might want to control where a workload goes based on attributes of that workload, right? Like maybe it has to run on a cluster that's PCI certified, or maybe it has to run in Europe because it's for like a European client or something like that. So there's lots of other kinds of applications for these, these policies beyond just validation. And those were things that we did lots of work with early on. Now, obviously we've seen a lot of people take it and run it for validations. That's what you hear about mostly, but it was just a long tail of other use cases.

[00:35:17] Torin Sandall: I was chatting with somebody the other day that was like using it to define basically like they built a little VPN gateway for their company out of it. So they would define the rules, basically the type effect. Usually the IP table rules would get generated out of a policy and then pushed down, but it was all kind of specified in this nice high-level declarative format. So there's a ton of different ways that you can use it. That's what kind of gets me kind of excited about it.

[00:35:39] Adam Hawkins: Yeah, I think in also the scale, which you can operate if you adopt these technologies, I think it's always important to consider the end as terms of like organization size or like the number of services one has or anything like that. Because like, as you mentioned, we're coming back to this person named Alice who's, you know, looking at things.

[00:35:57] Adam Hawkins: It's one thing, if you have say one application and you can have one human being, look at it and say like, yes, it's free of this. But at some point, you're going to have more than one. You might have ten and have a hundred, might have a thousand. And at some point you have to rely on automation as a way to enforce all these things.

[00:36:16] Adam Hawkins: But also just to make sure that they're actually there for all of this stuff, you're going to put into production. So it's just really important, especially at a enterprise.

[00:36:24] Adam Hawkins: I actually have another question, which was a better component I'm not familiar with. You mentioned. So we've so far, we've talked about OPA, we talked about Rigo and we talked about contest, but you brought up gatekeeper. So what's gatekeeper?

[00:36:38] Torin Sandall: So gatekeeper is the evolution of OPA in the Kubernetes space for emission control. So gatekeeper provides kind of a first class integration between open policy agent and Kubernetes. So it gives you a way of managing all of your policies, like controlling what policies are applied on the cluster through kind of like a native Kubernetes interface through CRDs.

[00:36:58] Torin Sandall: And then it brings in some other really nice functionality in Kubernetes, like auditing for example. You know, one of the really important kind of design criteria that we had, or considerations that we had for OPA and Rego, was it the policies that you write ought to be in must be portable and that you should be able to take them and run them in different locations for different reasons, right?

[00:37:15] Torin Sandall: So when you write down a policy that says, you know, all containers must specify CPU and memory limits. You obviously want to take that and have it enforced, let's say at the cluster level, when new containers are being stagnated. But ideally you can take that exact same policy, and running in other places.

[00:37:30] Torin Sandall: So you want to go run it ahead of time during CICB or as a, kind of pre-marriage hook, right? So that you tell your developer when they opened the PR that they didn't set the configuration directly, they don't want to, they don't want to wait for however long, it takes for that configuration to flow through the pipeline and then get rejected at the cluster level.

[00:37:48] Torin Sandall: Cause they might not even know where the cluster is. They might not know how to debug that. Right. And there goes your day right now, you're now you've lost a day because you've been debugging, something that you could've just found out, as a premature.

[00:37:58] Torin Sandall: So you want, you want to have that portability and similarly, like you want able to take the same policy and ask after the fact, you know, what resources in my cluster would violate or do violate this new policy that I'm going to be introducing right before I roll it out.

[00:38:11] Torin Sandall: So one of the things that gave you a rings is audit and kind of controller functionality, where you can have it configured to periodically, go out and scan all of the resources in your cluster, and then give you back a report that says, you know, these are the resources that violate these rules.

[00:38:25] Torin Sandall: So gatekeepers is like really great kind of integration between OPA and Kubernetes. And it's kind of a joint work of us, you know, Syrah as well as Microsoft, Google and others. So it's a great project and it just actually yesterday or the day before went to GA. So it's stable now.

[00:38:40] Adam Hawkins: Oh, well, congratulations on that one big milestone there. Okay. I got one last question for you before we, before we go. So I understand gatekeeper is a purpose-built way to integrate OPA and Kubernetes, because Kubernetes is, of course there's a large amount of people using Kubernetes and it's a good platform. And there's a hook, as you mentioned in the API server for emission controllers, you know, there's more out there than just Kubernetes.

[00:39:04] Adam Hawkins: So is there anything on the horizon for OPA in the terms of like a gatekeeper type integrations for others systems?

[00:39:12] Torin Sandall: That's a good question. I, right now, I mean, so Kubernetes is like sucking the oxygen, I think could have a lot of discussions. Right. And for good reason, like it helps unify like your kind of your platforms, right your workload platforms.

[00:39:24] Torin Sandall: So I think it's definitely the dominant thing that we see in that kind of like platform infrastructure space in terms of whether you're talking about managing containers or servers or cloud accounts, like that's where we see people kind of focusing right now.

[00:39:35] Torin Sandall: But in terms of other like interesting integrations one of the things that we've worked on for a little while now that we're continuing to develop all the time is basically the ability to take your, your oval policies and through OPA compile them into, into WebAssembly so that you can take those WebAssembly compile policies and run them in new kinds of places.

[00:39:53] Torin Sandall: So like, obviously it's useful for potentially doing some kind of pre-checks in the browser, perhaps, but it's also useful from a number of other different applications for things like optimization and running in other kinds of environments where OPA it doesn't run today, right? Like if you look at a CDN, for example, they don't run OPA and they don't have OPAs and they're in their pops today.

[00:40:12] Torin Sandall: And maybe they won't because of WebAssembly, but they do have WebAssembly run time. So for example, if you wanted to do some enforcement, like at the edge edge of the network, that's like one application for it. There's a bunch of different, interesting applications where you could take a WebAssembly compiled policy and have it kind of executed in these standardized environments.

[00:40:27] Torin Sandall: So that's something that we've been working on for a little while now. We have full support. Now you can compile any over policy into web. And what we're looking at now is different ways that we can use that. So we're building out SDKs for different languages and runtimes. So you can have open policies effectively evaluated without an out of process call inside of any language or framework that has a WebAssembly runtime, which is pretty much all of them, as well as just for optimization purposes.

[00:40:49] Torin Sandall: So while, while a little bit does have some pretty sophisticated optimizations implemented in it is still like an interpreter implemented in go. So there's, there's much overhead during a bad policy evaluation just by that fact, there's an interpretive overhead that comes from that. And WebAssembly allows us to get rid of a lot of that.

[00:41:04] Torin Sandall: So for certain use cases that require very, very, very fast response times like very low latency on the order of microseconds or even less nanoseconds, OPA isn't always like a good fit, but have assembly can be.

[00:41:14] Torin Sandall: So we're actually in the process also of like integrating it into the OPA runtime proper so that it can be used as like an evaluation path for excited by it. And it's can touches a bunch of different, interesting places and, and something that we're working on right now.

[00:41:27] Adam Hawkins: Yeah. Okay. So when you mentioned what assembly I kind of was taken aback, like what, what assembly, like what are you doing in the browser, but then when you mentioned, oh, okay, well you can use this in any language that has a web assembly runtime that, oh, okay. Now this is a bridge into using these policies and executing them locally in any particular language.

[00:41:45] Adam Hawkins: So coming back to my example of doing like contract validation on an incoming acre requests, outgoing responses that, oh, if there was a WebAssembly runtime that, Hey, I could use these policies for this. And as you mentioned, you could maybe get microseconds or nanoseconds, which is going to be way faster than what you're going to get from JavaScript or pretty much any other interpreted language. You'd have to go compiled for that for sure.

[00:42:08] Adam Hawkins: All right, well torn. It was my pleasure to talk to you about OPA Rigo contest and the whole project. Really great work, really happy to get a feel for these tools, use them and try to share them with the, with the listeners and everyone out there, because I think these are, you know, really amazing tools. So wait, work to you and everybody over there and, you know, pass my well-wishes long to eveybody.

[00:42:28] Torin Sandall: Okay, cool.Thanks a lot.

[00:42:29] Adam Hawkins: So is there anything you'd like to leave the listeners with before we go?

[00:42:33] Torin Sandall: No, I mean, thanks for inviting me on, I guess the one thing I'll just mentioned is that OPA is very flexible, right? It's very domain agnostic. It's for general purpose, but it doesn't do anything unless you integrate it into, into other systems, unless you put it in your CICT system or you build an integration with Kubernetes or you build an integration with whatever it is that you need it to be integrated with.

[00:42:50] Torin Sandall: And we, we really love to see new integrations between OPA and systems that people care about and that use, and they want to have better policy support for us.

[00:42:57] Torin Sandall: So if you have an idea for an integration, you know, feel free to post an issue on GitHub or come on slack, there's a, there's a slack organization for the project and feel free to contribute it. We love, we love contributions in the form of integrations. So yeah.

[00:43:09] Adam Hawkins: And if you are curious about how to use these things or need help, then the slack channel is a great way to get feedback from the people who actually maintain the project. Like when I was learning Rego, Torin was grateful enough to answer my questions and point me like, Hey, you shouldn't do this or you should not do this. Or like, Hey, what you're doing here is just not going to work like oh, okay.

[00:43:28] Adam Hawkins: Because I, at least for me learning Rego was sufficiently different enough from the other languages that I was exposed to, that I had to learn. Think about it because the domain is different. So a lot of helpful people out there in the project, if you want to learn.

[00:43:41] Adam Hawkins: So thanks for listening everybody. And thank you once again for coming on the show.

[00:43:46] Adam Hawkins: That wraps up this batch. It's a small batch of that. A for the show notes also find small batches they're found on Twitter and leave your comments in the thread for this episode. More importantly, subscribe to this podcast for more episodes, just like this one.

[00:44:00] Adam Hawkins: If you enjoy this episode, then tweeted or posted to your team's slack rapist. ITunes. It all supports the show and helps me produce more small batches. Well, I hope to have you back again for the next episode. So until then, happy shipping.

[00:44:19] Adam Hawkins: Want to learn more about dev ops that wasting your time and sign up for my free email course@freedevopscourse.com. My course combines the best from the DevOps handbook, accelerate and years of software delivery experience. You'll learn the three ways of DevOps and the four KPIs of software delivery performance.

[00:44:38] Adam Hawkins: More importantly, I'll show you how to put that theory into practice. That means shipping better software faster. Sign up today@freedevopscourse.com.

[00:44:52] Adam Hawkins: Like the sound of small batches. This episode was produced by pods worth media. That's pods worth.com.

Creators and Guests

Torin Sandall
Guest
Torin Sandall
quality / good abstractions / @StyraInc / @OpenPolicyAgent
Open Policy Agent with Torin Sandall
Broadcast by