Testing digital forensic data processing tools
Testing digital forensic data processing tools
Background
In Parsing the $MFT NTFS metadata file [1] I concluded that as a digital forensic analyst you need to apply the most suitable technique for the task at hand. To do so, you as the analyst need to understand the technique, potential edge cases and how your tooling deals with these. If your analysis work ends up in court, you as the analyst, are the one that needs to be able to defend it.
Unfortunately, many of the current publications such as blog posts, online discussion, books and training courses are mainly centered around using tools. A lot of the time, the selection of such tools is based on preferred platform, preferred output formats, preferred price range, preferred user experience. Although these can be important evaluation factors, these do not help to defend our findings in a legal context, such as a court.
This article looks at several principles and resources that can be beneficial when evaluating your tools. If your analysis work never ends up in court you might argue, why bother? Another way of looking at it is that when you want automation to reason about data, you want it to reliably do so.
Digital forensics tooling principles
Several resources that look at digital forensics tooling at a more fundamental level for testing were found in the course of writing this post:
Computer Forensics Tool Testing Program (CFTT) by NIST [2] which defines testing criteria and results for various types of digital forensics tooling.
Working Group on Digital Evidence (SWGDE) [3] which seems to be more focused on the procedural aspects of handling evidence.
Evaluating Digital Forensic Tools (DFTs) [4], which defines several “Digital Forensics Tools Evaluation Metrics” which unfortunately the evaluation metrics proposed are not very specific to processing of digital data.
Several publications in Digital Investigation [5] which were not publicly accessible so they are considered out of scope.
The CFTT program seems to define the most applicable specifications for digital forensics tooling. Most of the CFTT specification seem to focus on acquisition of digital data, with just a couple covering the processing of digital data:
Active File Identification & Deleted File Recovery Tool Specification (Draft 1 of Version 1.1 March 24, 2009) [6]
Windows Registry Forensic Tool Specification (Draft 2 of Version 1.0 for Public Comment - June 2018) [7]
Although both are still in draft they outline some applicable principles for testing a digital forensic data processing tool (hereafter forensic tool). Note that some principles have been left out since they were considered less relevant for this article.
Out of the principles reviewed, probably the most applicable are:
A forensic tool shall have the ability to perform an interpretation of supported objects.
A forensic tool shall preserve its input data in its original state. If modifications are required, these are stored separately from the input data.
A forensic tool is transparent about objects it has inferred and the method of inference that was used.
A forensic tool is transparent about representation modifications it makes.
A forensic tool shall have the ability to notify the analyst (user) of abnormal information detected during data processing.
Optional principles:
A forensic tool shall have the ability to identify recoverable objects and provide means to recover them.
Some of these principles might appear self-explanatory, but let’s take a closer look at each of them.
1. Ability to perform an interpretation of supported objects
To test a forensic tool on the correctness [8] of its ability to perform an interpretation of supported objects, we need to have an understanding of which data format objects (or structures) are supported. If possible, we also have an understanding which data format objects are not supported.
Unfortunately, a lot of data formats are proprietary without official public documentation. If documentation exists, it is typically based on individual observations, analysis of binary data or other means of reverse engineering (in the broadest sense). Also, if documentation exists, it is pretty common for it to be out of date, or it deviates from the actual implementation [9].
So if we do not necessarily have a definition of which objects are supported we have to assume that the forensic tool should at minimum match the reference implementation of the data format, such as the application from which the data originates. This can be a moving target if the data format changes over time.
To validate a forensic tool we would need test data that has been created with the reference implementation. Preferably test data that is reproducible and can be manually verified if needed. In practice, however, data formats have many edge-cases. Depending on the reference tooling available it can be complex to reproduce edge-cases in test data.
You might be asking, what about using existing forensics corpora? Existing forensics corpora can be useful to uncover edge-cases and study them. However, such corpora typically have bias as well; for example, a corpus of US English-only installations of operating systems will not help uncover edge-cases due to internationalization and localization [10].
So there is a catch-22 here: we cannot create representative test data without understanding the edge-cases and we cannot know about edge-cases without representative test data. One could make the case that our ability to test such data formats follows a maturity model [11]:
A note of caution here is that observation reported by multiple sources does not necessarily imply that they were independent. Examples of research notes copying errors from other sources without validation are not uncommon [12].
The same caution applies to using multiple tools to cross-validate findings. You need to make sure the implementations of the tools are independent. Examples of tool authors copying errors from other authors are known [13].
By now you might think, nice theory but I’m glad when I get my tools running. More on a practical application of this model later in this article.
2. Preserve input data in its original state
Paraphrasing the CFTT specification “a forensic tool shall have the ability to perform an interpretation of supported objects without modification to the objects”. Unfortunately the specification lacks an explanation of what is actually meant with this. Does the specification refer to the fact that a forensics tool should not alter the original data (read-only)? Or is formatting of a date and time in a different time zone also considered modification? And what about construction of inferred objects like a full path?
This article has split this up in multiple core principles of which the first one is that “a forensic tool shall preserve its input data in its original state. If modifications are required, these are stored separately from the input data”. At first glance this principle does not require much discussion. It is good practice for a forensic tool, or shared module used by a forensic tool, to not alter the input data during processing. If it needs to make alterations for example to apply changes in a transaction journal, these changes can be tracked in a data source separate from the input data.
One caveat here is that the original state of input data can be dynamic, for example forensic tool processing a file system that is write mounted and being altered. For the sake of brevity this is considered outside the scope of this article but note that a lot of data processing forensics tools are not written with handling changing input data in mind. More on the state of input data in the next core principle.
3. Be transparent about inferred objects
This is probably one of the most overlooked and least understood core principles. Let’s state principle again: “A forensic tool is transparent about objects it has inferred and the method of inference that was used.”
Let’s take the renamed directory example from “Full path reconstruction is an approximation“ in “Parsing the $MFT NTFS metadata file”:
The file “\somedir\somefile.txt” is created
5 hours later this file is deleted
Another 3 hours later the directory “\somedir” is renamed to “\newdir”
Techniques that solely look at the data in the $MFT to reconstruct the path of the deleted file will infer that the “\newdir\somefile.txt” was its path. However “\somedir\somefile.txt” was the actual path that existed on the input data at some point and the inferred path never existed. So as an analyst it is important to know that the output value you are looking at is inferred or not.
Now typically such an error in inference [14] will not necessarily break your case, but can affect your credibility as an expert in a court when your report did not explain this, or weaken the chain of your evidence [15].
4. Be transparent about representation modifications it makes
“A forensic tool is transparent about representation modifications it makes”. This principle sounds straightforward but let’s demonstrate the importance of transparency of modification to representation and subtleness of errors with an example.
Consider the following scenario on an NTFS file system:
A file “testfile1” is created
100 milliseconds later the file is accessed and its access time is updated
100 milliseconds later the content of the file is modified its modification time is updated
A script to reproduce the test data can be found in the ntfs-specimens project.
In this example we have 3 unique events occurring with the file. Let’s use the SleuthKit tool fls to create a bodyfile.
Due to limitations of the bodyfile format [16] used by fls we use relevant accuracy [17] of the timestamps, for example “1598723379” seconds instead of “1598723379.539569855” seconds.
Let’s use a bodyfile that preserves the accuracy of the original timestamps.
Now let’s run the SleuthKit tool mactime, which according to its man page [18] creates an ASCII timeline of file activity.
What just happened? Our input was 3 distinct file activities and in the output of mactime they now appear as 1 activity (represented as 2 lines) that hints at file creation. This hypothesis is further strengthened by the date and time values in the $FILE_NAME attribute.
A more accurate representation of the actual file activity would be:
Note that one could argue that the line that contains the $FILE_NAME information should be part of a single file creation activity. More concerning is that neither fls or mactime warned about its altering accuracy, nor can we tell from the mactime output that the tool is inferring that 3 distinct file activities are 1 [19].
Using bodyfiles and mactime is a widely used methodology in the field. It is known to be taught by training institutes and recommended by professionals in the field. However the method does not appear to have been broadly debated with an emphasis on the digital forensics principles mentioned previously.
Errors in inference and those due to silent representation modifications are the hardest to test for and catch. Mainly because their impact is often marginal and they are easy to overlook. If you're working as a counter-expert in a US court case, look closely for such inference errors.
5. Notify about abnormal information detected during data processing
This is an interesting principle since as we concluded before we do not necessarily know all edge-cases of a data format. Again CFTT is not very specific about what they mean, so let’s consider a couple of diverse examples of “abnormal information”:
a bug in the tool or shared module that causes it to error;
obvious corrupted data, such as data with validation markers like signatures or checksums;
non-obvious corrupted data or special crafted data [20];
an edge-case introduced in the data format by a new version of the original application or secondary implementation that altered the data format [21];
trailing data at the end of a file that is not part of the data format that is used to store a malicious payload;
a malicious payload stored as a regular data stream supported by the data format, such as PE/COFF resource.
Independent post-processing methodologies to validate the output of the forensics tool can be good additional safeguards here.
Since “abnormal information” is broad, let’s consider only the types 1 and 2 for now. Let’s look at an example where the lack of notification about abnormal information detected can affect our analysis.
Consider an ext4 file system with the inline data feature enabled. On this file system we create a directory “testdir1” with 2 entries “testfile1” and “TestFile2”. Since the directory entries data is small they are stored inline. A script to reproduce the test data can be found in the ext-specimens project.
An open source go-to tool for a majority of analysts to create a file system timeline of an ext4 file system is the SleuthKit, so let’s use fls:
Where are “testfile1” and “TestFile2”? Let’s check if fls returned an error code:
As you can see it is quite critical to know as an analyst that we are missing relevant data in our output. Since we controlled the input we know it is missing but with most case related input data we do not.
Even though this issue has been reported to the SleuthKit project [22] a word of caution here. SleuthKit is used by numerous other open source and proprietary digital forensics tools and in-house scripts used for automation.
Ironically the SleuthKit does provide us with a notification when the individual directory is listed with fls:
One take away for a testing methodology is to make sure to validate different data processing options as part of testing.
6. Ability to identify and recover objects
First of all CFTT defines this principle as a forensic tool shall have the ability to identify and recover deleted objects. Let’s first highlight why deleted was removed in the context of this article.
Deletion is typically a deliberate action initiated by a user or a process to remove data [23]. Depending on the implementation data is made inaccessible instead of completely being removed.
Recovery is a process of salvaging (retrieving) inaccessible, lost, corrupted, damaged or formatted data when the data stored in them cannot be accessed in a usual way [24].
So the process of recovery can salvage more than just deleted objects. A more detailed discussion about deletion versus recovery can be found in Standardization of File Recovery Classification and Authentication [25].
Recovery is a process of inference, therefore the principle, that “a forensic tool is transparent about objects it has inferred and the method of inference that was used”, should apply. Unfortunately this is rarely the case. Subtle issues in tooling handling corruption can lead to incorrect data being presented in results [26].
The need for more transparency and validation
There seems to be a growing concern for validation of forensic tooling, such as imposing ISO 17025 and creating a standardized corpus for SQLite database forensics [27]. Note that these have a strong focus on forensics applied to criminal cases, which make sense given the impact of a tooling error in that context.
The CFTT seems to be one of a few attempts in the field to actually test and validate forensic tools; as we outlined in this article it is unfortunately ambiguous and leaves room for interpretation. There are several academic publications about forensic tool testing, however, these do not seem to have resulted in much change in applied practices either.
There seems to not be much appetite for such a discussion among practitioners. Most practitioner focused articles on tooling continue to focus mainly on preferred platforms, preferred output formats, preferred price range, preferred user experience. Rarely methodologies and tooling taught in training courses are put for scrutinization or recommendations made by professionals in the field questioned in public fora. Often “pro tips” are given that lack the necessary nuance and context needed for these to be discussed in a legal setting and can be harmful when applied incorrectly.
Several actions you can start with today:
Build up an in-house corpus of data to test new versions of tools against. Validation testing should be part of your software evaluation process. In the end it is your legal case that might be at risk.
Ask your tooling vendor for transparency about how they validate their tooling and releases. Ask them for proof that can be demonstrated in a court of law.
Ask your training institute for transparency about how they validated the methodologies and tooling they are teaching. Ask about potential conflict of interest of independent training institutes sponsoring or selling tooling that have additional commercial interests. If so, consider to apply point 2 as well.
When you publicize about observations of new data format edge-cases, share test data. Try to ensure test cases are reproducible. If not, please mention that more work is needed to ensure reproducible test data can be created.
Work with peers to document and reproduce edge-cases found in data formats, that can be shared in the public domain.
Ask people publishing “pro tips” to provide more context and nuance. If you are one of these people giving “pro tips” write a more detailed article first before providing advice without the necessary context [28].
The need for (more) scalable test solutions
In the course of writing this post no publications were found that emphasized on the scalability of a test solution.
From a forensic tool maintainer perspective, validation testing ideally is part of the CI/CD process [29]. In practice this can mean long running end-to-end tests to ensure to cover all edge-cases. Modularization can help to isolate data formats and corresponding tests to reusable modules.
To improve the test coverage of data processing tools to a “(4) Managed” maturity level the following is needed:
A shared understanding of the data format and its edge-cases, preferably captured in one or more independent specifications.
Controlled the input as test data, that represent all known reproducible edge-cases.
Complementary “manually” altered test data, could be used to cover known, complex to reproduce, edge or corruption cases.
Expanding technologies such as virtualization or containerization could drive a “(5) Optimizing” maturity level, such as:
Automated means to generate test data from different versions of the original application.
Automated means to test the original application with fault injection to generate unknown edge-cases.
From a forensic tool user / tester perspective virtualization and/or containerization can be used:
to create more self-contained (hermetic) test environments that improve reproducibility and reduce factors that influence testing in unforeseen ways. To note here is that you want to run acceptance tests in an environment as closest to the production environment as possible. Virtualization and/or containerization allow your testing environment basically be the same as your production environment;
to scale up testing, so new versions can be evaluated more quickly in case of need;
to easier test integration with other tools.
This is a topic to explore in more detail in a future post.
Conclusion
This article is not intended to provide an all encompassing list of principles to be adopted by digital forensic data processing tooling. Nor does it claim it will have all the answers regarding building adequate test coverage.
One thing that is evident is that change is needed if we want to use digital forensic data processing tools for automated analysis/reasoning. Testing requirements that were covered:
Have unambiguous and shared testing principles. Hope that this article provides a first iteration of these for digital forensic data processing tools.
Have public accessible reference data format specifications and test data.
Take action, to ask for more transparency and do validation.
If you have additional ideas don’t hesitate to reach via email or on the Open Source DFIR Slack community.
Comments
Post a Comment