Introduction

Linux container security has been covered in a number of blog posts and conference presentations, including our previous post about Container Forensics with Docker Explorer. However, when we came across Windows containers during an investigation we noticed their implementation was quite different and not well documented from a forensics perspective. Despite finding some details about containerised Windows Registry hives in Maxim Suhanov’s blog post dfir.ru, not much had been written about how Windows implemented container filesystems.

This post will detail the research process and useful findings about Windows containers. It primarily focuses on the filesystem layers and does not cover containerised registry hives.

Windows Containers

One of the most informative resources we found on how Windows containers work was a DockerCon talk from 2016 titled "Windows Server & Docker - The Internals Behind Bringing Docker & Containers to Windows - Black Belt''. To summarise, containers traditionally rely on Linux features such as namespaces and cgroups which are not present in Windows. To work around this, some changes were made to Windows kernel space components to create similar functionality such as:

Extending job objects to include a "silo" concept to provide resource isolation.
Namespace virtualisation including separate object namespaces.

One of the crucial differences from a forensics perspective is how Windows containers handle filesystems. The DockerCon talk mentions that it is difficult to build a full union filesystem like those used for Linux containers because Windows applications expect certain NTFS features to be present. Instead, Microsoft came up with a hybrid model involving a virtual block device and NTFS partition per container.

While the Windows container API is not publicly documented, Microsoft has provided language bindings in:

Go - under the hcsshim project. This is how the Docker project interfaces with the Windows container API.
C# - under the dotnet-computevirtualization project.

Container types

Windows containers also offer two different "isolation" modes:

Process isolation, where container processes run on the host kernel.
Hyper-V isolation, where containers run in a minimal Hyper-V virtual machine.

Microsoft also makes a distinction between Windows Containers on Windows (WCOW) and Linux Containers on Windows (LCOW) which involves running Linux containers under a Hyper-V VM or Windows Subsystem for Linux (WSL). This blog will mostly focus on WCOW.

Inspecting Docker Artifacts

Installing Docker and pulling an image

For research purposes we started with a fresh Windows Server 2019 VM, installed Docker with the instructions listed here and then pulled down a nanoserver container image:

PS C:\ProgramData\docker> docker pull mcr.microsoft.com/windows/nanoserver:1809

On Windows, the Docker root is under c:\ProgramData\docker, shown below with irrelevant directories omitted:

[...]

├── image

│ └── windowsfilter

│ ├── distribution

│ │ ├── diffid-by-digest

│ │ │ └── sha256

│ │ │ └── b9043d31610e0[...]

│ │ └── v2metadata-by-diffid

│ │ └── sha256

│ │ └── 8c4dee97552f5[...]

│ ├── imagedb

│ │ ├── content

│ │ │ └── sha256

│ │ │ └── ad675c9cb2d58[...]

│ │ └── metadata

│ │ └── sha256

│ ├── layerdb

│ │ ├── sha256

│ │ │ └── 8c4dee97552f[...]

│ │ │ ├── cache-id

│ │ │ ├── descriptor.json

│ │ │ ├── diff

│ │ │ ├── os

│ │ │ ├── size

│ │ │ └── tar-split.json.gz

│ │ └── tmp

│ └── repositories.json

[...]

│

└── windowsfilter

└── ebf46384a2e8[...]

├── Files

│ ├── License.txt

│ ├── ProgramData

│ │ ├── [OMITTED]

│ ├── Users

│ │ ├── [OMITTED]

│ └── Windows

│ └── [OMITTED]

├── Hives

│ ├── DEFAULTUSER_BASE

│ ├── SAM_BASE

│ ├── SECURITY_BASE

│ ├── SOFTWARE_BASE

│ └── SYSTEM_BASE

├── UtilityVM

│ ├── Files

│ │ ├── [OMITTED]

│ ├── SystemTemplate.vhdx

│ └── SystemTemplateBase.vhdx

├── bcd.bak

├── bcd.log.bak

├── bcd.log1.bak

├── bcd.log2.bak

├── blank-base.vhdx

├── blank.vhdx

├── layerchain.json

└── layout

The windowsfilter directory contains container filesystems and will be of interest for forensics. Looking at the directory layout above:

Files: Contains the read-only files for the image layer.
Hives: Contains the base registry hives used for containerised registry hives.
UtilityVM: Files related to the VM for Hyper-V isolation containers.
blank-base.vhdx/blank.vhdx: These are related to the "virtual block device per container" as alluded to in the DockerCon presentation.
layerchain.json: is null in this case but references the next layer for a container.

Running a container

Next, we ran a container, and created a file for later inspection:

PS C:\ProgramData\docker> docker run -it ad675c9cb2d5

Microsoft Windows [Version 10.0.17763.1935]

C:\>echo filecontent > C:\Users\ContainerUser\filename.txt

C:\>exit

Reviewing the windowsfilter directory:

└── windowsfilter

├── 5da330568248b011aae9ba466dc20f208d75982308e45e0479863959a20f3406

│ ├── layerchain.json

│ └── sandbox.vhdx

└── ebf46384a2e816f7695cb48e0368e6077de5d06985a1a516a775c892132c6dd7

├── [...]

As expected there is a new subdirectory 5da3305682... for the created container with newly created files sandbox.vhdx and layerchain.json, which references the parent directory:

$ cat windowsfilter/5da330568248[...]/layerchain.json

["C:\\ProgramData\\docker\\windowsfilter\\ebf46384a2e8[...]"]

Block device layer

Based on the DockerCon talk and what we know so far, it appears that Windows containers use differencing vhdx disks to manage the writable "scratch" layer for containers with:

Each container having a writable differential disk sandbox.vhdx
The parent disk set to the upper layer's blank-base.vhdx

blank-base.vhdx

blank-base.vhdx just contains an NTFS volume with an empty WcSandboxState directory. As this disk doesn't contain any layer-related files, it appears that the relationship between these files is more to reduce the size of the per-container sandbox.vhdx rather than to manage any kind of container/image layer relationships.

PS C:\ProgramData\docker> Mount-DiskImage -Access ReadOnly -ImagePath C:\ProgramData\docker\windowsfilter\ebf46384a2e8[...]\blank-base.vhdx

PS C:\ProgramData\docker> Add-PartitionAccessPath -DiskNumber 1 -PartitionNumber 2

-AccessPath z:

PS C:\ProgramData\docker> gci -Hidden z:

Mode LastWriteTime Length Name

---- ------------- ------ ----

d--hs- 6/8/2021 9:41 AM WcSandboxState

blank.vhdx/sandbox.vhdx

This is just a blank differencing vhdx disk with its parent set to blank-base.vhdx. When a new container is created from this image this file is copied and renamed to sandbox.vhdx in the container's directory, confirmed by creating a new container and checking that the hash of sandbox.vhdx matches the hash of blank.vhdx in the upper layer:

PS C:\ProgramData\docker> docker create ad675c9cb2d5

d438d794f472[...]

PS C:\ProgramData\docker> Get-FileHash -Algorithm SHA256 .\windowsfilter\ebf46384a2e8[...]\blank.vhdx

SHA256 70DABAEEDA01D94E[...]

PS C:\ProgramData\docker> Get-FileHash -Algorithm SHA256 .\windowsfilter\d438d794f472[...]\sandbox.vhdx

SHA256 70DABAEEDA01D94E[...]

The vhdx specification shows that the parent disk indicator is contained in the metadata of the disk, running strings against the disk confirms that sandbox.vhdx's parent is set to blank-base.vhdx from the parent image:

strings -e l sandbox.vhdx

absolute_win32_pathC:\ProgramData\docker\windowsfilter\ebf46384a2e8[...]\blank-base.vhdx

Mounting vhdx files

Adding support to Docker Explorer for mounting these container filesystems requires a way to mount differencing vhdx files. Although this is trivial on Windows using built in tools, at the time of writing this blog it was unsupported on Linux.

The short term hacky solution was a Python script to merge the two vhdx files and output a raw image that could be mounted. A standalone tool merge_vhdx.py has been added to the docker-explorer GitHub repository which is then invoked by docker-explorer to output a raw container image:

$ de.py -r docker/ mount 5da330568248[...] UNUSED_MOUNT_POINT

Warning: Due to differences in the Windows container implementation this

command will not actually mount the given container FS but will create a

mountable raw image:

5da330568248b011aae9ba466dc20f208d75982308e45e0479863959a20f3406.raw

Which can then be mounted using standard tools.

This command will create a new disk image of size 20480MiB.

Please confirm (y/n): y

$ sudo mount -o ro,offset=$((264192*512)) 5da330568248[...].raw mnt

$ ls -l mnt/

total 4

lrwxrwxrwx 1 root root 25 May 7 22:37 License.txt -> 'unsupported reparse point'

dr-xr-xr-x 1 root root 0 May 7 21:41 ProgramData

dr-xr-xr-x 1 root root 0 Jun 9 10:51 Users

drwxrwxrwx 1 root root 0 Jun 9 10:51 WcSandboxState

dr-xr-xr-x 1 root root 4096 Jun 9 10:51 Windows

Filesystem layer

So far we've figured out how Windows containers use differential vhdx files to manage the writable container layer with the sandbox.vhdx->blank-base.vhdx relationship primarily to reduce disk usage rather than manage container layers.

There is still something missing from our understanding, namely how the filesystem layer relationship works. The previous section hinted at this with the “unsupported reparse point” message. Viewing a few more files in our mounted image shows more of these reparse points:

$ ls -l mnt1/Windows/System32

lrwxrwxrwx 1 root root 25 May 7 21:40 adtschema.dll -> unsupported reparse point

lrwxrwxrwx 1 root root 25 May 7 21:40 advapi32legacy.dll -> unsupported reparse point

lrwxrwxrwx 1 root root 25 May 7 21:40 aepic.dll -> unsupported reparse point

lrwxrwxrwx 1 root root 25 May 7 21:40 apisetschema.dll -> unsupported reparse point

[...]

These appear to be files in the parent layer not modified within our container. Testing this with an unmodified file:

$ ls -l mnt1/Windows/System32/drivers/etc/hosts

lrwxrwxrwx 1 root root 25 May 7 21:41 mnt1/Windows/System32/drivers/etc/hosts -> 'unsupported reparse point'

Then for a container where the same file has been modified, it is now present without a reparse point:

$ ls -l mnt2/Windows/System32/drivers/etc/hosts

-rwxrwxrwx 1 root root 836 Jun 15 18:40 mnt2/Windows/System32/drivers/etc/hosts

Reviewing the MFT reparse attribute (192-3) for this file (inode 353):

$ icat -f ntfs -o 264192 5da330568248[...].raw 353-192-3 | xxd

00000000: 1800 0080 5e00 0005 0100 0000 0000 0000 ....^...........

00000010: 9321 3ce3 628a 1c5c 8fca 0cef 35b5 c279 .!<.b..\....5..y

00000020: 4400 5700 6900 6e00 6400 6f00 7700 7300 D.W.i.n.d.o.w.s.

00000030: 5c00 5300 7900 7300 7400 6500 6d00 3300 \.S.y.s.t.e.m.3.

00000040: 3200 5c00 6400 7200 6900 7600 6500 7200 2.\.d.r.i.v.e.r.

00000050: 7300 5c00 6500 7400 6300 5c00 6800 6f00 s.\.e.t.c.\.h.o.

00000060: 7300 7400 7300 s.t.s.

According to Microsoft documentation this is a REPARSE_DATA_BUFFER data element, of which the first four bytes are the reparse tag of 80000018 (accounting for endian-ness). This tag corresponds to IO_REPARSE_TAG_WCI: "Used by the Windows Container Isolation filter. Server-side interpretation only, not meaningful over the wire."

While IO_REPARSE_TAG_WCI appears to be undocumented by Microsoft, Ladislav Zezula has provided a definition in their FileTest tool. Using this definition for our attribute we end up with:

LookupGuid: 93213ce3628a1c5c8fca0cef35b5c279
WciName: Windows\System32\drivers\etc\hosts

WciName obviously corresponds to the file in the parent image and LookupGuid is an identifier for the next layer which in this case is expected to be ebf46384a2e8[...]. After further review of the hcsshim project, it appears that this GUID is generated by calling vmcompute!nametoguid with the layer name as an argument.

Conclusion

Windows containers are more complex than their Linux counterparts in order to provide features expected by Windows applications. Rather than a simple union file system they use a combination of virtual block devices (vhdx files) and NTFS reparse points to manage container layers.

Currently Docker Explorer can mount the virtual block device but does not properly handle the NTFS reparse points. This will still allow the writable layer of a container to be mounted which is likely to be of most interest during an investigation, and then any unmodified files can be found in the parent image manually.

References

dfir.ru blog: "Containerized registry hives in Windows" https://dfir.ru/2020/08/15/containerized-registry-hives-in-windows/
DockerCon '16 Presentation: "Windows Server & Docker - The Internals Behind Bringing Docker & Containers to Windows" https://www.youtube.com/watch?v=85nCF5S8Qok
GitHub: Docker Explorer Tool https://github.com/google/docker-explorer
GitHub: FileTest - IO_REPARSE_TAG_WCI definition https://github.com/ladislav-zezula/FileTest/blob/master/WinSDK.h#L658
GitHub: Microsoft hcsshim Project https://github.com/microsoft/hcsshim
GitHub: Moby Project Windows Graph Driver https://github.com/moby/moby/blob/master/daemon/graphdriver/windows/windows.go

Microsoft: Containers on Windows documentation https://aka.ms/containers
Microsoft: Get started - Prep Windows for containers https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/set-up-environment?tabs=Windows-Server
Microsoft: REPARSE_DATA_BUFFER definition https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c3a420cb-8a72-4adf-87e8-eee95379d78f
Microsoft: Technet Article - "Introducing the Host Compute Service (HCS)" https://techcommunity.microsoft.com/t5/containers/introducing-the-host-compute-service-hcs/ba-p/382332

Open Source DFIR

Windows Container Forensics

Introduction

Comments

Post a Comment

Popular posts from this blog

Parsing the $MFT NTFS metadata file

Incident Response in the Cloud

Container Forensics with Docker Explorer