Timesketch, Header Mapping for CSV imports

Introduction

Timesketch is an open-source tool for collaborative forensic timeline analysis. The Timesketch user interface allows a user to upload CSV, JSONL and Plaso storage files. This blogpost will focus on Timesketch's CSV import capability.
CSV is a common denominator format for data change between analytic tools. Timesketch's current CSV import feature requires the file having a set of columns representing a message, a timestamp and its description. These fields need to be named in a certain way otherwise the Timesketch server will discard them. For this reason, usually, users have to pre-process their CSVs between export from a tool and import into Timesketch to guarantee this constraint.
This blogpost describes a new solution to overcome this limitation, i.e., uploading a CSV without one or more required headers. This new functionality is called header mapping because it allows the user to map one or more missing required headers to any fields found in the CSV.
The target audience for this blogpost are analysts who are familiar with basic concepts of Timesketch and have a running Timesketch instance. This blogpost guides a user through uploading a generic CSV file into Timesketch with the header mapping functionality.

Disclaimer

The data source for this blogpost is a PCAP file processed into a CSV by Wireshark.
Although a CSV extracted from a PCAP is a helpful example for this blogpost, Timesketch is not the perfect solution to analyze network traffic because it is not a packet analyzer tool. For this purpose, Wireshark is a better alternative than Timesketch if the user wants to conduct an in-depth network analysis. Timesketch can still be helpful in network analysis as a way  to aggregate information. For example, it can generate graphs that describe the number of events with respect to the source, destination host, or date and time. This information allows the analyst to have a quick overview of the network connectivities found in the PCAP.
An example PCAP can be found at the Malware Traffic Analysis blogpost. We converted it into a CSV using Wireshark. To reproduce the next steps, be sure to also export the datetime into the format UTC - Date and Time of the Day.
Figure 1 shows the number of events in the PCAP with respect to the datetime.
Figure 1. Number of events extracted from network traffic per hour 


Problem statement

In versions of Timesketch before 2022-08-18, when users uploaded a CSV, they did not receive an indication that the file was missing headers, and the import just failed. The server processed a CSV only if it had a specific set of headers, i.e., (i) message, (ii) datetime, and (iii) timestamp_desc. In particular it managed the upload request in the following way:
  • If the mandatory fields are present, then the CSV is read and uploaded into Timesketch's data store.
  • If one or more required fields are missing, the server discards the file and sends an error message to the user.
Users could send the CSV to the server even if some required headers were missing. This implementation had the following drawbacks:
  • The user will generate a large amount of useless network traffic if the server is not on the same machine as the client.
  • The server’s memory is temporarily used to store the CSV file.
To be correctly processed, a CSV file must have a specific and correct syntax for the mandatory headers. For example, if a CSV contains a header named “msg”, the server cannot interpret it as the required field “message”.

Design idea

With the solution proposed in this blogpost, the user can:
  • Associate missing required headers with existing headers in the CSV.
    • For example, map a column called 'msg' with the required field 'message'.
  • Preview the CSV before uploading it to Timesketch to understand how header mapping will change the structure of the timeline, for example, renaming or adding new columns.
  • Parse the CSV with different delimiters, i.e., comma, semicolon and pipe.
The user cannot upload on the server a file without the mandatory CSV fields or the headers mapping.

Headers mapping via UI

The presence of the mandatory CSV headers is checked on the client-side. They are automatically extracted from the CSV according to its delimiter. The user cannot directly upload the file on the server if one of the mandatory headers is missing.
Figure 2 shows the uploaded CSV that does not have the mandatory headers. Wireshark's CSV export does not include the columns datetimemessage and timestamp_desc.

Figure 2. Headers mapping functionality for a file without the required CSV fields.

A new mapping can be selected if one of the mandatory headers is missing. The server will use it to rename or add columns representing the mandatory fields. If the datetime field is missing for example, the user can rename the column time into datetime.
There are three ways to map the missing headers: 
  1. Create a new column with a default value, which is most useful for the timestamp_desc field.
  2. Rename a column, which is most useful for the datetime field.
  3. Concatenating several fields to make a new column, most useful for the message field. 

Create a new column with a default value

When choosing the mapping from the dropdown menu, the user can choose to create a new column with a default value specified in an alert message. This is helpful when there is not a field in the CSV that can be directly mapped to the required header. For example, for a CSV that contains the creation timestamp of files - and other data -, the timestamp description could be file_creation.
Figure 3 illustrates the mapping for the mandatory header timestamp_desc. The server will create the timestamp_desc column with a default value “PCAP_Network_Connection”.

Figure 3. 1:1 mapping with timestamp_desc field.

Rename a column

When choosing the mapping from the dropdown menu, the user can rename the existing field found in the CSV with the missing mandatory header. This is helpful for the datetime field because in many CSV it may be called in different ways, such as time, creation_time, timestamp…
Figure 4 illustrates the mapping for the mandatory header datetime. The server will use this mapping to rename the CSV header time into datetime.

Figure 4. 1:1 mapping for datetime field.

Create a new column from the concatenation of different fields

The user can select multiple CSV headers that the server will combine into a single column that is named as the required header by choosing the mapping from the checkboxes. The columns will be preserved in the indexed data alongside the newly created column.
In Figure 5, the message field is mapped with 4 different headers. The server will create a new column that will be named message and it will contain the concatenation of the selected fields’ values.

Figure 5. 1:N mapping for message field.

Preview of the CSV

The preview of the CSV file is a table that shows the status of the mapping. The colors of the headers in the preview show the status of a field: this may be present in the CSV (blue), or its mapping may be empty (red) or completed (green). Figure 6 shows an example of a preview of a CSV.

Figure 6. Preview of the uploaded CSV with the current headers mapping.

 

How the feature works behind the scenes

Once the header mapping is complete, the file can be submitted to the server together with two extra parameters: the CSV delimiter that will be used on the server to parse the file and the headers mapping.
The headers mapping is an array of dictionaries with the following fields:
  • Target: the header that will be inserted.
  • Source: the list of the headers that will be combined, or, if there is only one value, to rename. This field can be empty if the user had previously selected Create New Header. 
  • Default value: the value that the server will use to fill the column if a new header is created.
Consider the following CSV and headers mapping as an example:

Creation Time

File Name

Privileges

Owner

2022-08-12

br8.txt

777

Marco

...

...

...

...



This CSV is missing the fields datetime, timestamp_desc and message. With the following mapping:

headers_mapping = [ {target: "datetime", source: ["Time"], default_value: None}, {target: "timestamp_desc", source: None, default_value: "PCAP_Network_Connection"}, {target: "message", source: ["Source", "Destination", "Length", "Protocol"], default_value: None} ]

Doing the mapping and renaming the headers on the client side would require additional logic and permissions for JavaScript to write files to the client disk which is one of the main reasons this is done on the server side. This also enables the Timesketch API and the CLI client to introduce a similar feature.
When the upload and indexing is completed the data can be accessed via the Explore tab in Timesketch. Figure 7 illustrates the first 3 events parsed by Timesketch. The fields datetime and timestamp_desc fields were renamed as expected and a new column message with the combination of values that we selected in the previous list of checkboxes was created.

Figure 7. Timeline imported from the CSV with headers mapping.

Conclusion


The headers mapping feature brings the following improvements: 
  • An error message in the upload form notifies the user if one or more mandatory headers are missing. This action avoids unsuccessful uploads that are caused by missing required fields.
  • The user can upload CSVs without one or more required headers by providing a semantic mapping between the fields found in the CSV and the mandatory ones.
  • The preview of the CSV shows how the timeline will look like.
  • The server can parse CSVs with different delimiters.
The next challenge to cover is about extending this capability for JSONL files. Unlike JSON files, imports in JSONL format can be streamed from disk, making them far less memory intensive than regular JSON files. Stay tuned for more work in the near future to make importing JSONL more convenient.

Comments

Popular posts from this blog

Parsing the $MFT NTFS metadata file

Incident Response in the Cloud

Container Forensics with Docker Explorer