Parsing the $MFT NTFS metadata file



Background

Numerous articles have been written about parsing the $MFT NTFS metadata file before. Some of these explain the technique itself [1] and others mostly focus on how to use tools. Rarely do articles mention the pros and cons of the technique from a digital forensics perspective [2]. 

What is $MFT parsing? 

For readers not familiar with the NTFS file system, MFT stands for Master File Table. This table is stored in a file system metadata file, meaning that as far as the file system is concerned it is a file, however that the file contains file system metadata not “regular” content. The name of this file is “$MFT”. 

The MFT consists of a sequence of (predetermined) fixed-size entries (or MFT entries). These entries are typically 1024 bytes in size, however the size is defined in the NTFS volume header and in a “regular” MFT entry itself. 

The first 4-bytes of a “regular” MFT-entry (or record) starts with the signature “FILE”. The MFT entry can also be filled with 0-byte values, indicating it is empty (or unused), or have a special purpose signature such as “BAAD” which presumably indicates a bad MFT entry [3]. 

The $MFT contains a significant part of the file system metadata, it also contains the content of very small files (those that fit within the MFT entry). Hence it is considered a useful resource for determining the contents of the file system and changes made to it, without having to obtain a full copy of the file system. 

To reconstruct the file system typically a $MFT parser has to: 
  1. Determine the size of a MFT entry. 
  2. Check if the MFT entry contains information it can extract, typically by checking for the “FILE” signature. 
  3. Apply the fix-up values [4] if needed (also see note below). 
  4. Extract relevant information from the NTFS attribute, such as name of the file, MACB or MACE timestamps [5]. 
  5. Reconstruct “full paths”. 
Note that different versions of NTFS have different ways of storing the fix-up values [6]. It is necessary that a $MFT parser can handle the different NTFS versions, since older versions of the data format have been observed to be used. For example, prior to Windows 10 version 1903, NTFS file systems with a 64k cluster block size would use version 1.2 (Introduced in Windows NT 3.51) instead of the current version 3.1 (Introduced in Windows XP). 

Pros and cons of $MFT parsing 

Let’s assume a $MFT parser is able to properly determine the MFT entry size and handle the fix-up values. What are some of the pros and cons of only parsing the $MFT instead of using all the file system information? 

Not all file system data is stored in the $MFT 

One of the main advantages of only parsing the $MFT is that it is not needed to obtain a full copy of a volume of the file system. The size of modern $MFT NTFS metadata files is approximately several Gigabytes, where the size of the full volume can be tens or hundreds of Gigabytes. 

The information in the $MFT can be used to quickly assess file system contents and changes. This can be effective to triage for intrusion cases, especially when dealing with many systems, but can be insufficient for a criminal case since various parts of NTFS are stored outside the $MFT: 
  • content of directories are in $I30 indexes; 
  • contents of attribute lists; 
  • several other file system metadata files such as the USN change journal ($UsnJrnl:$J). 

Attribute lists 

Consider the following MFT entry:
MFT entry: 9 information:
        Is allocated                    : true
File reference : 9-9
Base record file reference : Not set (0)
Journal sequence number : 0
Number of attributes : 10

Attribute: 1
Type : $STANDARD_INFORMATION (0x00000010)
Creation time : Jul 30, 2017 19:40:03.548784300 UTC
Modification time : Jul 30, 2017 19:40:03.548784300 UTC
Access time : Jul 30, 2017 19:40:03.548784300 UTC
Entry modification time : Jul 30, 2017 19:40:03.548784300 UTC
Owner identifier : 0
Security descriptor identifier : 257
Update sequence number : 0
File attribute flags : 0x20000006
Is hidden (FILE_ATTRIBUTE_HIDDEN)
Is system (FILE_ATTRIBUTE_SYSTEM)
Is index view (0x20000000)

Attribute: 2
Type : $ATTRIBUTE_LIST (0x00000020)
Data VCN : 0
Data size : 576 bytes
...
    As previously stated the data of the $ATTRIBUTE_LIST attribute is stored outside the $MFT and therefore not available to the $MFT parser. 

    The $ATTRIBUTE_LIST attribute data contains information about which file references are used to store the attribute, in case of the example 1242-1 and 4903-1. The corresponding “list data” MFT entry sets its “base record file reference” value and can contain $FILE_NAME attributes.

    MFT entry: 1242 information:
            Is allocated                    : true
            File reference                : 1242-1
            Base record file reference      : 9-9
            Journal sequence number         : 0
            Number of attributes         : 1

    Attribute: 1
            Type                       : $DATA (0x00000080)
            Name                       : $SDS
            Data VCN range                : 0 - 1519
            Data size                     : 6222584 bytes
            Data flags                    : 0x0000

    The attribute list of MFT entries can be reconstructed based on the base record file reference. However not every $MFT parser supports this and some $MFT parsers are known to skip $FILE_NAME attributes stored in $ATTRIBUTE_LIST [7]. 

    Testing all MFT parser tools is beyond the scope of this article, but if you want to assess how your go to $MFT parser handles this you can generate the test file “ntfs-scenario3.1.vhd” using the generate-specimens-behavior.bat script of the ntfs-specimens project.

    Full path reconstruction is an approximation 

    A high-level overview of a technique commonly used to reconstruct full paths based on the data within an $MFT file is:
    1. Based on the $FILE_NAME attribute determine the parent MFT entry 
      1. If the parent MFT entry is in use (allocated) 
      2. If the sequence number matches determine its name and continue with its parent up to the root 
      3. If the sequence number does not match consider the file orphaned and the original full path can no longer be determined 
    2. If the parent MFT entry is no longer in use (unallocated) 
      1. If the sequence number - 1 matches1 determine its name and continue with its parent up to the root 
      2. If the sequence number does not match consider the file orphaned and the original full path can no longer be determined
    At a high level this sounds fine, now let’s consider some aspects that make full path reconstruction challenging. 

    Attributes can reference multiple file entries 

      If a hard link has been created to a file its MFT entry contains an additional $FILE_NAME attribute for the hard link e.g.
      MFT entry: 42 information:
              Is allocated                    : true
              File reference                  : 42-1
              Base record file reference      : Not set (0)
              Journal sequence number         : 0
              Number of attributes            : 4

      Attribute: 1
              Type                            : $STANDARD_INFORMATION (0x00000010)
              Creation time                   : Dec 01, 2019 08:37:58.333261200 UTC
              Modification time               : Dec 01, 2019 08:37:58.333261200 UTC
              Access time                     : Dec 01, 2019 08:37:58.333261200 UTC
              Entry modification time         : Dec 01, 2019 08:37:58.337681600 UTC
              Owner identifier                : 0
              Security descriptor identifier  : 263
              Update sequence number          : 2080
              File attribute flags            : 0x00000020
                      Should be archived (FILE_ATTRIBUTE_ARCHIVE)

      Attribute: 2
              Type                            : $FILE_NAME (0x00000030)
              Parent file reference           : 41-1
              Creation time                   : Dec 01, 2019 08:37:58.333261200 UTC
              Modification time               : Dec 01, 2019 08:37:58.333261200 UTC
              Access time                     : Dec 01, 2019 08:37:58.333261200 UTC
              Entry modification time         : Dec 01, 2019 08:37:58.333261200 UTC
              File attribute flags            : 0x00000020
                      Should be archived (FILE_ATTRIBUTE_ARCHIVE)
              Name                            : testfile3

      Attribute: 3
              Type                            : $FILE_NAME (0x00000030)
              Parent file reference           : 46-1
              Creation time                   : Dec 01, 2019 08:37:58.333261200 UTC
              Modification time               : Dec 01, 2019 08:37:58.333261200 UTC
              Access time                     : Dec 01, 2019 08:37:58.333261200 UTC
              Entry modification time         : Dec 01, 2019 08:37:58.337681600 UTC
              File attribute flags            : 0x00000020
                      Should be archived (FILE_ATTRIBUTE_ARCHIVE)
              Name                            : hardlink1

      The $STANDARD_INFORMATION attribute2 applies to both “testfile3” and “hardlink1”. These are 2 separate file entries on the file system in different parent directories, one with file reference 41-1 and the other with 46-1. Technically there are 2 different full paths that are linked to most (but not all) of the attributes in the MFT entry.

      Depending on the type of information that is needed to represent it can be relevant to provide all file paths to specific details.

      Directories can have multiple names 

      Windows does not support directories to be hard linked, but a directory (in NTFS) can have both a DOS (8.3) and Windows name, e.g.
      MFT entry: 38 information:
              Is allocated                    : true
              File reference                  : 38-1
              Base record file reference      : Not set (0)
              Journal sequence number         : 0
              Number of attributes            : 4

      Attribute: 1
              Type                            : $STANDARD_INFORMATION (0x00000010)
              Creation time                   : Dec 03, 2013 06:35:09.502378300 UTC
              Modification time               : Dec 03, 2013 06:35:09.517978300 UTC
              Access time                     : Dec 03, 2013 06:35:09.502378300 UTC
              Entry modification time         : Dec 03, 2013 06:35:09.517978300 UTC
              Owner identifier                : 0
              Security descriptor identifier  : 264
              Update sequence number          : 0
              File attribute flags            : 0x00000026
                      Is hidden (FILE_ATTRIBUTE_HIDDEN)
                      Is system (FILE_ATTRIBUTE_SYSTEM)
                      Should be archived (FILE_ATTRIBUTE_ARCHIVE)

      Attribute: 2
              Type                            : $FILE_NAME (0x00000030)
              Parent file reference           : 36-1
              Creation time                   : Dec 03, 2013 06:35:09.502378300 UTC
              Modification time               : Dec 03, 2013 06:35:09.502378300 UTC
              Access time                     : Dec 03, 2013 06:35:09.502378300 UTC
              Entry modification time         : Dec 03, 2013 06:35:09.502378300 UTC
              File attribute flags            : 0x00000026
                      Is hidden (FILE_ATTRIBUTE_HIDDEN)
                      Is system (FILE_ATTRIBUTE_SYSTEM)
                      Should be archived (FILE_ATTRIBUTE_ARCHIVE)
              Name                            : {38088~1

      Attribute: 3
              Type                            : $FILE_NAME (0x00000030)
              Parent file reference           : 36-1
              Creation time                   : Dec 03, 2013 06:35:09.502378300 UTC
              Modification time               : Dec 03, 2013 06:35:09.502378300 UTC
              Access time                     : Dec 03, 2013 06:35:09.502378300 UTC
              Entry modification time         : Dec 03, 2013 06:35:09.502378300 UTC
              File attribute flags            : 0x00000026
                      Is hidden (FILE_ATTRIBUTE_HIDDEN)
                      Is system (FILE_ATTRIBUTE_SYSTEM)
                      Should be archived (FILE_ATTRIBUTE_ARCHIVE)
              Name                            : {3808876b-c176-4e48-b7ae-04046e6cc752}

      This MFT entry defines both “{38088~1” and “{3808876b-c176-4e48-b7ae-04046e6cc752}” as a name for the same file entry. Technically this file has (at least) 2 different full paths, a DOS and Windows one. 

      Tools typically omit the DOS path since it is synonymous with the Windows one. Note in the process of converting the long name to a short one data is lost [8] and that information about the short names is needed to e.g. determine if “C:\PROGRA~1” corresponds to “C:\Program Files” or “C:\Program Data”. 

      Directories renames 

      Consider the following scenario:
      1. The following file is created “\somedir\somefile.txt” 
      2. 5 hours later this file is deleted 
      3. Another 3 hours later the directory “\somedir” is renamed to “\newdir” 
        Techniques that solely look at the data in the $MFT to reconstruct the path of the deleted file will conclude that the “newdir\somefile.txt” was its path. Technically this is incorrect, “\newdir” was not the name of the directory at the time “\somedir\somefile.txt” existed.

        If available, the data in $UsnJrnl:$J can provide this data e.g.
        USN record:
        Update time : Dec 01, 2019 08:37:58.343342400 UTC
        Update sequence number : 2160
        Update reason flags : 0x80000200
        (USN_REASON_FILE_DELETE)
        (USN_REASON_CLOSE)

        Update source flags : 0x00000000

        Name : somefile.txt
        File reference : 43-1
        Parent file reference : 41-1
        File attribute flags : 0x00000020
        Should be archived (FILE_ATTRIBUTE_ARCHIVE)

        USN record:
        Update time : Dec 01, 2019 08:37:58.343342400 UTC
        Update sequence number : 2320
        Update reason flags : 0x00001000
        (USN_REASON_RENAME_OLD_NAME)

        Update source flags : 0x00000000

        Name : somedir
        File reference : 41-1
        Parent file reference : 39-1
        File attribute flags : 0x00000010
        Is directory (FILE_ATTRIBUTE_DIRECTORY)

        USN record:
        Update time : Dec 01, 2019 08:37:58.343342400 UTC
        Update sequence number : 2400
        Update reason flags : 0x00002000
        (USN_REASON_RENAME_NEW_NAME)

        Update source flags : 0x00000000

        Name : newdir
        File reference : 41-1
        Parent file reference : 39-1
        File attribute flags : 0x00000010
        Is directory (FILE_ATTRIBUTE_DIRECTORY)
          Having the information about the rename can be relevant when trying to locate file paths found in other system sources such as the Event Logs.

          If you want to assess how your go to $MFT parser handles this you can generate the “ntfs_path_hint.vhd” test file using the generate-specimens-behavior.bat of the ntfs-specimens project. 

          Reparse points and junctions 

          Mounted reparse points and junctions can hide low-level file system details that might be relevant [9, 10]. Parsing the $MFT NTFS metadata file or an offline volume image will provide you with metadata about reparse points and junctions only. 
            However in other circumstances it might be relevant to understand the live state of a reparse point or junction. In such a case using operating system API calls to determine file content and metadata might be necessary. 

              Conclusion 

                I hope this article has provided you more insight into the pros and cons of parsing the $MFT NTFS metadata file. As a digital forensic analyst you need to apply the most suitable technique for the task at hand. To do so, you as the analyst need to understand the technique, potential edge cases and how your tooling deals with these. 
                  NTFS is a complex file system [11] with many edge cases, most which have not been broadly discussed within the digital forensics field. So if you feel you have an interesting one, please reach out on the Open Source DFIR Slack community.

                    1. On deletion (de-allocation) the sequence number of an MFT entry is increased by 1.
                    2. And all other attributes, such as $DATA and $OBJECT_ID, except for the $FILE_NAME attributes for that matter.

                    Popular posts from this blog

                    Incident Response in the Cloud

                    Container Forensics with Docker Explorer