Deploying GRR to Kubernetes for Incident Response

Kubernetes (k8s) is being used to run more and more infrastructure in the cloud, but what happens when there’s a security incident, such as a coin miner running in your cluster? Many security organisations are well equipped to deal with incidents in their local environment, but still struggle to adapt to incident response in cloud environments.

GRR Rapid Response (GRR) is an incident response framework focused on remote live data collection and analysis. You may already use GRR for incident response in your local environment, but GRR can also be used in containerized Kubernetes environments.

This article (guide) is based on research conducted by Bart Inglot and I, and describes how to deploy the GRR client in a Kubernetes environment. The guide assumes you already have the GRR server running. If not, you can follow the GRR documentation to set it up. In our example, we’re using Fleetspeak for agent communication, but the old GRR agent communication should also work. Fleetspeak can be enabled during deployment of the GRR server.

At a high level, the approach is:

  • Install the GRR agent (and Fleetspeak) into a Docker image
  • Push the Docker image to a Container Registry
  • Configure the Docker image as a Kubernetes DaemonSet

Building the Docker Image

The GRR client Debian package should be downloadable in the Admin UI of your GRR server. If your server was configured with Fleetspeak enabled, the GRR client Debian package will also include the Fleetspeak client.

For this example, we’ve placed the installation package in the folder grr/packages.
Once you have the GRR client package for installation, create a Dockerfile like our example below grr/Dockerfile:

# Using Ubuntu 18.04 as the base image

FROM ubuntu:18.04


# Update the base image

RUN apt update && apt -y upgrade && apt -y dist-upgrade


# You can also install other tools into the container here.

# As an example, we'll install docker-explorer

RUN apt -y install software-properties-common

RUN add-apt-repository ppa:gift/stable && apt update

RUN apt -y install docker-explorer-tools


# Copy the GRR installation packages into the image

RUN mkdir /tmp/packages

COPY "packages/grr_3.4.2.3_amd64.deb" /tmp/packages


# Install the package

RUN dpkg -i --no-triggers "/tmp/packages/grr_3.4.2.3_amd64.deb"


# Remove the package from the image after installation

RUN rm -Rf /tmp/packages


# fleetspeakd.nanny is a script to continually loop and ensure the Fleetspeak client is running

CMD ["/usr/lib/fleetspeakd/fleetspeakd.nanny","/usr/sbin/fleetspeakd","--log_dir=/var/log"]

The default command in the Dockerfile above is a nanny script to ensure the Fleetspeak client is always running. Fleetspeak in turn will ensure that the GRR client is running. If you’re not using Fleetspeak, you could include a similar script to ensure that the standalone GRR client is always running. Here’s an example script:

#!/bin/bash

MOREARGS=("${@:1}")


while true; do

  "${MOREARGS[@]}"

  /usr/bin/logger --tag grr GRR client exited... Waiting 120 seconds before respawn. || true

  sleep 120

done

We can then use Docker to build the image from the directory where the Dockerfile is located (grr).

$ docker build -t grr ./

Pushing the Docker Image to a Container Registry

In this example, we’re using Google Container Registry (GCR) to host our GRR image, but in theory any other Docker repository will suffice.

Tagging the Image

We first need to use Docker to add a tag to the image, then push the image to the Container Registry. The tag is in the format <container-registry>/<project-id>/<image-name>:<tag>. If a tag is not supplied, Docker will apply the latest tag.

$ docker tag grr gcr.io/project-id/grr

Pushing the Image

In order to push the image to the Container Registry, Docker needs to be authenticated. For GCR, you can follow this guide to authenticate Docker.
Once authenticated, you can push the tagged image into the Container Registry.

$ docker push gcr.io/project-id/grr:latest

Configure the DaemonSet

Create a deployment YAML file for our image (grr-daemonset.yaml). The configuration below will only deploy the image to Kubernetes nodes labeled with grr=installed:

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: grr

  labels:

    app: grr

spec:

  selector:

    matchLabels:

      app: grr

  template:

    metadata:

      labels:

        app: grr

    spec:

      # Only deploying GRR to nodes with the label 'grr=installed'.

      nodeSelector:

        grr: installed

      # Uses the host network rather than the container network. This way

      # the hostname that appears in GRR will contain the cluster name.

      hostNetwork: true

      # Allows us to list all processes on the node rather than just those

      # from the container.

      hostPID: true

      # Labeling volumes from the root file system so they can be exposed to

      # our container.

      volumes:

      - name: root

        hostPath:

          path: /

      # Specify our GRR container in GCR

      containers:

      - name: grr

        image: gcr.io/project-id/grr:latest

        # Making it a privileged container. This way the processes within

        # the container get almost the same privileges as those outside the

        # container (e.g. manipulating the network stack or accessing devices).

        securityContext:

          privileged: true

        # Exposing the node file system to the GRR container (read-only).

        volumeMounts:

        - mountPath: /hostroot

          name: root

          readOnly: true

If your Kubernetes cluster is not in the same cloud project as your Container Registry, you may also need to provide credentials for the Container Registry in the DaemonSet YAML. This can be done with Kubernetes Secrets and the imagePullSecrets field.

The DaemonSet can also be deleted with kubectl:

$ kubectl delete daemonset grr

Deploying the Container

Now that we have a DaemonSet, it’s easy to deploy to a specific node, or to all nodes in the cluster using kubectl:

# List all nodes:

$ kubectl get nodes

NAME                                            STATUS   ROLES    AGE   VERSION

gke-k8sgrr-testing-default-pool-174f74c5-313n   Ready    <none>   93m   v1.17.9-gke.1504

[...]

gke-k8sgrr-testing-default-pool-9622920c-thqf   Ready    <none>   93m   v1.17.9-gke.1504


# List only labeled nodes:

$ kubectl get nodes -l 'grr=installed'


# Label a node

$ kubectl label nodes gke-k8sgrr-testing-default-pool-174f74c5-313n grr=installed


# Label all nodes

$ kubectl label nodes --all grr=installed


# Remove the label from a node

$ kubectl label nodes gke-k8sgrr-testing-default-pool-174f74c5-313n grr-


# Remove the label from all nodes

$ kubectl label nodes --all grr-

Using the Container

Once deployed to a node, the node should appear in your GRR UI. You should be able to access the node file system and all running processes within the node through regular GRR flows.

Example Usage

Let’s assume we have a cluster with three nginx instances.

$ kubectl get pods -o wide

NAME                      READY   STATUS    RESTARTS   AGE   IP           NODE                                            NOMINATED NODE   READINESS GATES

nginx-1-9c9488bdb-2hz2w   1/1     Running   0          35m   10.108.7.4   gke-k8sgrr-testing-default-pool-4dbd44be-0rs7   <none>           <none>

nginx-1-9c9488bdb-p2bm9   1/1     Running   0          35m   10.108.5.4   gke-k8sgrr-testing-default-pool-174f74c5-313n   <none>           <none>

nginx-1-9c9488bdb-rcflm   1/1     Running   0          35m   10.108.1.4   gke-k8sgrr-testing-default-pool-9622920c-thqf   <none>           <none>

To deploy GRR to the nodes running nginx:

$ kubectl label nodes gke-k8sgrr-testing-default-pool-4dbd44be-0rs7 grr=installed

node/gke-k8sgrr-testing-default-pool-4dbd44be-0rs7 labeled

$ kubectl label nodes gke-k8sgrr-testing-default-pool-174f74c5-313n grr=installed

node/gke-k8sgrr-testing-default-pool-174f74c5-313n labeled

$ kubectl label nodes gke-k8sgrr-testing-default-pool-9622920c-thqf grr=installed

node/gke-k8sgrr-testing-default-pool-9622920c-thqf labeled

GRR UI

Now we can see all three nodes in the GRR UI. I’ll run a quick ListProcesses flow on each.

gke-k8sgrr-testing-default-pool-4dbd44be-0rs7

gke-k8sgrr-testing-default-pool-9622920c-thqf

gke-k8sgrr-testing-default-pool-174f74c5-313n

This example might be a bit over simplified in contrast to a production Kubernetes installation. In a much larger cluster, it may be an idea to regularly collect process listings for all nodes and use frequency of occurrence analysis to identify outlying or anomalous processes and look into what they are.

We can also use GRR to collect the process memory of a suspicious process.

The Process Dump flow above returned the following:

$ ls -1

evil-process_43793_56043c521000_56043c52b000.tmp

evil-process_43793_56043d357000_56043d378000.tmp

evil-process_43793_7fb39e03f000_7fb39e048000.tmp

evil-process_43793_7fb39e205000_7fb39e209000.tmp

evil-process_43793_7fb39e23c000_7fb39e23e000.tmp

evil-process_43793_7fb39e26a000_7fb39e26b000.tmp

evil-process_43793_7fff2838f000_7fff283b0000.tmp

evil-process_43793_7fff283dd000_7fff283df000.tmp

evil-process_43793_ffffffffff600000_ffffffffff601000.tmp


$ xxd evil-process_43793_7fff2838f000_7fff283b0000.tmp

[...]

0001ecd0: 68d2 3d28 ff7f 0000 30a7 269e b37f 0000  h.=(....0.&.....

0001ece0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

0001ecf0: 2321 2f62 696e 2f62 6173 680a 7768 696c  #!/bin/bash.whil

0001ed00: 6520 7472 7565 3b20 646f 2065 6368 6f20  e true; do echo 

0001ed10: 446f 696e 6720 616c 6c20 7468 6520 6576  Doing all the ev

0001ed20: 696c 2e2e 2e3b 2073 6c65 6570 2033 3630  il...; sleep 360

0001ed30: 303b 2064 6f6e 650a f5aa 0e9e b37f 0000  0; done.........

0001ed40: 0000 0000 0000 0000 0020 baa4 b54e 0727  ......... ...N.'

0001ed50: 3005 259e b37f 0000 0000 0000 0000 0000  0.%.............

[...]

Connecting to a Shell

Since we’re running GRR in a privileged container with read-only access to the host file system and process memory, we can also connect to a shell to perform live forensics using other tools. In this case, we’ve built docker-explorer into the container.

$ kubectl get pods -o wide

NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE                                            NOMINATED NODE   READINESS GATES

grr-584tt                 1/1     Running   0          54m   10.128.0.3    gke-k8sgrr-testing-default-pool-9622920c-thqf   <none>           <none>

grr-7lc4n                 1/1     Running   0          55m   10.128.0.10   gke-k8sgrr-testing-default-pool-4dbd44be-0rs7   <none>           <none>

grr-wnfwm                 1/1     Running   0          54m   10.128.0.8    gke-k8sgrr-testing-default-pool-174f74c5-313n   <none>           <none>

nginx-1-9c9488bdb-2hz2w   1/1     Running   0          97m   10.108.7.4    gke-k8sgrr-testing-default-pool-4dbd44be-0rs7   <none>           <none>

nginx-1-9c9488bdb-p2bm9   1/1     Running   0          97m   10.108.5.4    gke-k8sgrr-testing-default-pool-174f74c5-313n   <none>           <none>

nginx-1-9c9488bdb-rcflm   1/1     Running   0          97m   10.108.1.4    gke-k8sgrr-testing-default-pool-9622920c-thqf   <none>           <none>


# Connecting to our GRR container on the same node as the compromised nginx

$ kubectl exec -it grr-wnfwm -- /bin/bash

root@gke-k8sgrr-testing-default-pool-174f74c5-313n:/#

From here we can get a better view of which processes are running in which containers on the node.

root@gke-k8sgrr-testing-default-pool-174f74c5-313n:/# chroot /hostroot docker ps

CONTAINER ID        IMAGE                                  COMMAND                  CREATED

4a6b936ed006        gcr.io/project-id/grr                  "/usr/lib/fleetspeak…"

4cf7bab820df        nginx                                  "/docker-entrypoint.…"

6218c1804181        gke.gcr.io/fluent-bit-gke-exporter     "/fluent-bit-gke-exp…"

92f44a898596        gke.gcr.io/proxy-agent-amd64           "/proxy-agent --logt…"

d81805c00904        gke.gcr.io/fluent-bit                  "/fluent-bit/bin/flu…"

014c31adc3b2        gcr.io/gke-release/gke-metrics-agent   "/otelsvc --config=/…"

bd809a0333f1        k8s.gcr.io/prometheus-to-sd            "/monitor --source=k…"

bf7af26d3ac2        3ea926dd1033                           "/bin/sh -c 'exec ku…"

[...]

We can also run tools like docker-explorer against the node file system through the /hostroot mount point.

root@gke-k8sgrr-testing-default-pool-174f74c5-313n:/# de.py -r /hostroot/var/lib/docker list running_containers

[...]

   {

        "image_name": "nginx@sha256:4949aa7259aa6f827450207db5ad94cabaa9248277c6d736d5e1975d200c7e43",

        "container_id": "4cf7bab820dfdb3c4244da5385a320e2b095e379efdc080e8197fe851a00def3",

        "image_id": "f35646e83998b844c3f067e5a2cff84cdf0967627031aeda3042d78996b68d35",

        "labels": {

            "annotation.io.kubernetes.container.hash": "e1027825",

            "annotation.io.kubernetes.container.restartCount": "0",

            "annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log",

            "annotation.io.kubernetes.container.terminationMessagePolicy": "File",

            "annotation.io.kubernetes.pod.terminationGracePeriod": "30",

            "io.kubernetes.container.logpath": "/var/log/pods/default_nginx-1-9c9488bdb-p2bm9_04a41ec2-e1c2-4699-af76-401f06ef75a9/nginx-1/0.log",

            "io.kubernetes.container.name": "nginx-1",

            "io.kubernetes.docker.type": "container",

            "io.kubernetes.pod.name": "nginx-1-9c9488bdb-p2bm9",

            "io.kubernetes.pod.namespace": "default",

            "io.kubernetes.pod.uid": "04a41ec2-e1c2-4699-af76-401f06ef75a9",

            "io.kubernetes.sandbox.id": "847e1ef6ac2a6878158f52801254a235b3c83b114290fd1f5912d4168cd95713",

            "maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"

        },

       "start_date": "2020-10-14T02:07:02.176722",

        "mount_id": "4907289a58b769b6654212c86318a2a16c0a305587d64cfb3e0834a2f22ce39d",

        "mount_points": [

            {

                "source": "/hostroot/var/lib/docker/var/lib/kubelet/pods/04a41ec2-e1c2-4699-af76-401f06ef75a9/containers/nginx-1/6bcc97ed",

                "destination": "/dev/termination-log"

            },

            {

                "source": "/hostroot/var/lib/docker/var/lib/kubelet/pods/04a41ec2-e1c2-4699-af76-401f06ef75a9/etc-hosts",

                "destination": "/etc/hosts"

            },

            {

                "source": "/hostroot/var/lib/docker/var/lib/kubelet/pods/04a41ec2-e1c2-4699-af76-401f06ef75a9/volumes/kubernetes.io~secret/default-token-l66rr",

                "destination": "/var/run/secrets/kubernetes.io/serviceaccount"

            }

        ],

        "upper_dir": "/hostroot/var/lib/docker/overlay2/4907289a58b769b6654212c86318a2a16c0a305587d64cfb3e0834a2f22ce39d/diff",

        "log_path": "/var/lib/docker/containers/4cf7bab820dfdb3c4244da5385a320e2b095e379efdc080e8197fe851a00def3/4cf7bab820dfdb3c4244da5385a320e2b095e379efdc080e8197fe851a00def3-json.log"

    },

[...]

Docker explorer gives us the upper_dir for the compromised nginx container. From the GRR container we can take a look at the files directly.

root@gke-k8sgrr-testing-default-pool-174f74c5-313n:/# cd /hostroot/var/lib/docker/overlay2/4907289a58b769b6654212c86318a2a16c0a305587d64cfb3e0834a2f22ce39d/diff


root@gke-k8sgrr-testing-default-pool-174f74c5-313n:diff# ls -l tmp/

total 4

-rwxr-xr-x 1 root root 72 Oct 14 02:34 evil-process


root@gke-k8sgrr-testing-default-pool-174f74c5-313n:diff# cat tmp/evil-process

#!/bin/bash

while true; do echo Doing all the evil...; sleep 3600; done

Conclusion

Using Kubernetes node labels, we’re able to deploy the GRR agent (and other forensic tools) within a privileged container on demand. In this way, we can label the nodes we want to target, and perform live triage and analysis. It’s also easy to remove the tools afterwards by unlabeling the nodes.

Hopefully this article has provided some insight into using the GRR agent in Kubernetes environments. If you have questions or want to discuss more, please reach out on the Open Source DFIR Slack community.

Comments

Popular posts from this blog

Incident Response in the Cloud

Forensic Disk Copies in GCP & AWS

Introducing Libcloudforensics