Deploying GRR to Kubernetes for Incident Response
Kubernetes (k8s) is being used to run more and more infrastructure in the cloud, but what happens when there’s a security incident, such as a coin miner running in your cluster? Many security organisations are well equipped to deal with incidents in their local environment, but still struggle to adapt to incident response in cloud environments.
GRR Rapid Response (GRR) is an incident response framework focused on remote live data collection and analysis. You may already use GRR for incident response in your local environment, but GRR can also be used in containerized Kubernetes environments.
This article (guide) is based on research conducted by Bart Inglot and I, and describes how to deploy the GRR client in a Kubernetes environment. The guide assumes you already have the GRR server running. If not, you can follow the GRR documentation to set it up. In our example, we’re using Fleetspeak for agent communication, but the old GRR agent communication should also work. Fleetspeak can be enabled during deployment of the GRR server.
At a high level, the approach is:
- Install the GRR agent (and Fleetspeak) into a Docker image
- Push the Docker image to a Container Registry
- Configure the Docker image as a Kubernetes DaemonSet
Building the Docker Image
The GRR client Debian package should be downloadable in the Admin UI of your GRR server. If your server was configured with Fleetspeak enabled, the GRR client Debian package will also include the Fleetspeak client.
For this example, we’ve placed the installation package in the folder grr/packages.
Once you have the GRR client package for installation, create a Dockerfile like our example below grr/Dockerfile:
The default command in the Dockerfile above is a nanny script to ensure the Fleetspeak client is always running. Fleetspeak in turn will ensure that the GRR client is running. If you’re not using Fleetspeak, you could include a similar script to ensure that the standalone GRR client is always running. Here’s an example script:
We can then use Docker to build the image from the directory where the Dockerfile is located (grr).
Pushing the Docker Image to a Container Registry
In this example, we’re using Google Container Registry (GCR) to host our GRR image, but in theory any other Docker repository will suffice.
Tagging the Image
We first need to use Docker to add a tag to the image, then push the image to the Container Registry. The tag is in the format <container-registry>/<project-id>/<image-name>:<tag>. If a tag is not supplied, Docker will apply the latest tag.
Pushing the Image
In order to push the image to the Container Registry, Docker needs to be authenticated. For GCR, you can follow this guide to authenticate Docker.
Once authenticated, you can push the tagged image into the Container Registry.
Configure the DaemonSet
Create a deployment YAML file for our image (grr-daemonset.yaml). The configuration below will only deploy the image to Kubernetes nodes labeled with grr=installed:
If your Kubernetes cluster is not in the same cloud project as your Container Registry, you may also need to provide credentials for the Container Registry in the DaemonSet YAML. This can be done with Kubernetes Secrets and the imagePullSecrets field.
The DaemonSet can also be deleted with kubectl:
Deploying the Container
Now that we have a DaemonSet, it’s easy to deploy to a specific node, or to all nodes in the cluster using kubectl:
Using the Container
Once deployed to a node, the node should appear in your GRR UI. You should be able to access the node file system and all running processes within the node through regular GRR flows.
Let’s assume we have a cluster with three nginx instances.
To deploy GRR to the nodes running nginx:
Now we can see all three nodes in the GRR UI. I’ll run a quick ListProcesses flow on each.
This example might be a bit over simplified in contrast to a production Kubernetes installation. In a much larger cluster, it may be an idea to regularly collect process listings for all nodes and use frequency of occurrence analysis to identify outlying or anomalous processes and look into what they are.
We can also use GRR to collect the process memory of a suspicious process.
The Process Dump flow above returned the following:
Connecting to a Shell
Since we’re running GRR in a privileged container with read-only access to the host file system and process memory, we can also connect to a shell to perform live forensics using other tools. In this case, we’ve built docker-explorer into the container.
From here we can get a better view of which processes are running in which containers on the node.
We can also run tools like docker-explorer against the node file system through the /hostroot mount point.
Docker explorer gives us the upper_dir for the compromised nginx container. From the GRR container we can take a look at the files directly.
Using Kubernetes node labels, we’re able to deploy the GRR agent (and other forensic tools) within a privileged container on demand. In this way, we can label the nodes we want to target, and perform live triage and analysis. It’s also easy to remove the tools afterwards by unlabeling the nodes.
Hopefully this article has provided some insight into using the GRR agent in Kubernetes environments. If you have questions or want to discuss more, please reach out on the Open Source DFIR Slack community.