Parsing the $MFT NTFS metadata file
Background
Numerous articles have been written about parsing the $MFT NTFS metadata file before. Some of these explain the technique itself [1] and others mostly focus on how to use tools. Rarely do articles mention the pros and cons of the technique from a digital forensics perspective [2].What is $MFT parsing?
For readers not familiar with the NTFS file system, MFT stands for Master File Table. This table is stored in a file system metadata file, meaning that as far as the file system is concerned it is a file, however that the file contains file system metadata not “regular” content. The name of this file is “$MFT”.The MFT consists of a sequence of (predetermined) fixed-size entries (or MFT entries). These entries are typically 1024 bytes in size, however the size is defined in the NTFS volume header and in a “regular” MFT entry itself.
The first 4-bytes of a “regular” MFT-entry (or record) starts with the signature “FILE”. The MFT entry can also be filled with 0-byte values, indicating it is empty (or unused), or have a special purpose signature such as “BAAD” which presumably indicates a bad MFT entry [3].
The $MFT contains a significant part of the file system metadata, it also contains the content of very small files (those that fit within the MFT entry). Hence it is considered a useful resource for determining the contents of the file system and changes made to it, without having to obtain a full copy of the file system.
To reconstruct the file system typically a $MFT parser has to:
- Determine the size of a MFT entry.
- Check if the MFT entry contains information it can extract, typically by checking for the “FILE” signature.
- Apply the fix-up values [4] if needed (also see note below).
- Extract relevant information from the NTFS attribute, such as name of the file, MACB or MACE timestamps [5].
- Reconstruct “full paths”.
Pros and cons of $MFT parsing
Let’s assume a $MFT parser is able to properly determine the MFT entry size and handle the fix-up values. What are some of the pros and cons of only parsing the $MFT instead of using all the file system information?Not all file system data is stored in the $MFT
One of the main advantages of only parsing the $MFT is that it is not needed to obtain a full copy of a volume of the file system. The size of modern $MFT NTFS metadata files is approximately several Gigabytes, where the size of the full volume can be tens or hundreds of Gigabytes.The information in the $MFT can be used to quickly assess file system contents and changes. This can be effective to triage for intrusion cases, especially when dealing with many systems, but can be insufficient for a criminal case since various parts of NTFS are stored outside the $MFT:
- content of directories are in $I30 indexes;
- contents of attribute lists;
- several other file system metadata files such as the USN change journal ($UsnJrnl:$J).
Attribute lists
Consider the following MFT entry:The $ATTRIBUTE_LIST attribute data contains information about which file references are used to store the attribute, in case of the example 1242-1 and 4903-1. The corresponding “list data” MFT entry sets its “base record file reference” value and can contain $FILE_NAME attributes.
The attribute list of MFT entries can be reconstructed based on the base record file reference. However not every $MFT parser supports this and some $MFT parsers are known to skip $FILE_NAME attributes stored in $ATTRIBUTE_LIST [7].
Testing all MFT parser tools is beyond the scope of this article, but if you want to assess how your go to $MFT parser handles this you can generate the test file “ntfs-scenario3.1.vhd” using the generate-specimens-behavior.bat script of the ntfs-specimens project.
Full path reconstruction is an approximation
A high-level overview of a technique commonly used to reconstruct full paths based on the data within an $MFT file is:- Based on the $FILE_NAME attribute determine the parent MFT entry
- If the parent MFT entry is in use (allocated)
- If the sequence number matches determine its name and continue with its parent up to the root
- If the sequence number does not match consider the file orphaned and the original full path can no longer be determined
- If the parent MFT entry is no longer in use (unallocated)
- If the sequence number - 1 matches1 determine its name and continue with its parent up to the root
- If the sequence number does not match consider the file orphaned and the original full path can no longer be determined
Attributes can reference multiple file entries
The $STANDARD_INFORMATION attribute2 applies to both “testfile3” and “hardlink1”. These are 2 separate file entries on the file system in different parent directories, one with file reference 41-1 and the other with 46-1. Technically there are 2 different full paths that are linked to most (but not all) of the attributes in the MFT entry.
Depending on the type of information that is needed to represent it can be relevant to provide all file paths to specific details.
Directories can have multiple names
Windows does not support directories to be hard linked, but a directory (in NTFS) can have both a DOS (8.3) and Windows name, e.g.
This MFT entry defines both “{38088~1” and “{3808876b-c176-4e48-b7ae-04046e6cc752}” as a name for the same file entry. Technically this file has (at least) 2 different full paths, a DOS and Windows one.
Tools typically omit the DOS path since it is synonymous with the Windows one. Note in the process of converting the long name to a short one data is lost [8] and that information about the short names is needed to e.g. determine if “C:\PROGRA~1” corresponds to “C:\Program Files” or “C:\Program Data”.
Directories renames
Consider the following scenario:- The following file is created “\somedir\somefile.txt”
- 5 hours later this file is deleted
- Another 3 hours later the directory “\somedir” is renamed to “\newdir”
If available, the data in $UsnJrnl:$J can provide this data e.g.
If you want to assess how your go to $MFT parser handles this you can generate the “ntfs_path_hint.vhd” test file using the generate-specimens-behavior.bat of the ntfs-specimens project.
Reparse points and junctions
Mounted reparse points and junctions can hide low-level file system details that might be relevant [9, 10]. Parsing the $MFT NTFS metadata file or an offline volume image will provide you with metadata about reparse points and junctions only.Conclusion
1. On deletion (de-allocation) the sequence number of an MFT entry is increased by 1.
2. And all other attributes, such as $DATA and $OBJECT_ID, except for the $FILE_NAME attributes for that matter.