Find the needle faster with hashR data
A challenge in compromise investigations is the volume of data to be analysed. In a previous article we showed how hashR can be used to generate custom hash sets. In this article we demonstrate how such a custom hash set can speed up your investigation greatly by being able to find files (new binaries, modified configs) that are not part of a base (operating system) image.
In this article we are going to walk through investigating a compromised GCE VM running CentOS. Let’s assume we get an alert from our detection systems that this VM connected to an IP associated with a nation state (APT) actor.
Processing and preparing the data
First, we will run dfTimewolf’s gcp_forensics to acquire the disk from the compromised VM and prepare our investigative environment:
dftimewolf gcp_forensics --instances <compromised_vm_name> --analysis_project_name <analysis_project_name> <compromised_vm_project_name>
This command will perform the following steps:
- Copy the disk from the compromised VM to our analysis GCP project
- Create an analysis VM in our analysis GCP project
- Attach the disk from the compromised VM to the analysis VM
- Install Plaso and other open-source forensic tooling
Once this process is complete we can ssh into the analysis VM and run the following command to process the source disk with Plaso:
sudo log2timeline.py --yara_rules rules.yar --storage-file timeline.plaso /dev/<compromised_disk_name>
Timesketch has a hashR lookup analyzer that checks SHA256 hash values extracted from events against a hashR database and tags events in Timesketch accordingly. For the analyzer to work properly with our hashR database, it needs to be configured by setting the database connection information in the timesketch.conf configuration file of our Timesketch installation.
hashR config in timesketch.conf:
#-- hashR integration --#
# Uncomment and fill this section if you want to use the hashR lookup analyzer.
# Provide hashR postgres database connection information below:
HASHR_DB_USER = 'hashRuser'
HASHR_DB_PW = 'hashRpass'
HASHR_DB_ADDR = '127.0.0.1'
HASHR_DB_PORT = '5432'
HASHR_DB_NAME = 'hashRdb'
# The total number of unique hashes that are checked against the database is
# split into multiple batches. This number defines how many unique hashes are
# checked per query. 50000 is the default value. #
HASHR_QUERY_BATCH_SIZE = '50000'
# Set as True if you want to add the source of the hash ([repo:imagename]) as
# an attribute to the event. WARNING: This will increase the processing time
# of the analyzer!
HASHR_ADD_SOURCE_ATTRIBUTE = False
The configuration options below need to be uncommented and updated to refer to the hashR PostgreSQL database used.
The HASHR_QUERY_BATCH_SIZE can be left commented and only needs to be tweaked if you notice performance issues like a slow connection to the database or limited memory availability. The value defines how many unique hashes are checked against the database as a batch in one select statement.
HashR also records which base image it has seen the hash values. Setting HASHR_ADD_SOURCE_ATTRIBUTE = True will add an attribute to each event containing this source image information. For example, the resulting event attribute would look like this:
This will increase the processing time of the analyzer, activate this feature only if you definitely need the information. We can always query the hashR database directly if we want to get this information afterwards.
Important: For the changes in the timesketch.conf file to take effect, the Timesketch Docker container needs to be restarted! (docker-compose restart) Since our hashR instance is configured to ingest all public GCP base operating system (OS) images, it contains the hashes of all known files in the compromised CentOS instance of this scenario. To run the analyzer on the imported Plaso data, we need to wait for the Plaso file to be fully indexed and then we can start the "hashR lookup" analyzer via the Analyze tab in Timesketch. We select the timeline that we just uploaded and the "hashR lookup" analyzer from the list. Then we hit the green "Run 1 analyzers on 1 timelines" button.
Analyzing the data
yara_match:"executables_ELF" AND timestamp_desc:"Creation Time" AND NOT tag:known-hash
00003e80: c7b8 0000 0000 e875 ecff ff48 8d85 e0ef .......u...H....
00003e90: ffff 488d 3539 6b04 0048 89c7 e86f ecff ..H.59k..H...o..
00003ea0: ff48 8945 e848 8b45 e848 89c7 e86f ecff .H.E.H.E.H...o..
00003eb0: ff48 c745 e800 0000 00e9 8d02 0000 488d .H.E..........H.
00003ec0: 85e0 efff ff48 bb48 4944 455f 5448 4948 .....H.HIDE_THIH
00003ed0: 8918 48b9 535f 5348 454c 4c3d 4889 4808 ..H.S_SHELL=H.H.
00003ee0: 48bb 5820 7365 6420 272f 4889 5810 48b9 H.X sed '/H.X.H.
00003ef0: 2373 7368 6466 6c61 4889 4818 48bb 672f #sshdflaH.H.H.g/
00003f00: 2c24 2164 2720 4889 5820 48b9 2f73 6269 ,$!d' H.X H./sbi
00003f10: 6e2f 6966 4889 4828 48bb 7570 2d6c 6f63 n/ifH.H(H.up-loc
00003f20: 616c 4889 5830 48b9 203e 202f 746d 702f alH.X0H. > /tmp/
00003f30: 4889 4838 48bb 7379 7374 656d 642d 4889 H.H8H.systemd-H.
00003f40: 5840 48b9 7072 6976 6174 652d 4889 4848 X@H.private-H.HH
00003f50: 66c7 4050 7565 c640 5200 488d 85e0 efff f.@Pue.@R.H.....
00003f60: ff48 8d35 6a6a 0400 4889 c7e8 a0eb ffff .H.5jj..H.......
00003f70: 4889 45e8 488b 45e8 4889 c7e8 a0eb ffff H.E.H.E.H.......
00003f80: 48c7 45e8 0000 0000 488d 85e0 efff ff48 H.E.....H......H
00003f90: bb48 4944 455f 5448 4948 8918 48b9 535f .HIDE_THIH..H.S_
00003fa0: 5348 454c 4c3d 4889 4808 48bb 7820 7365 SHELL=H.H.H.x se
00003fb0: 6420 2d69 4889 5810 48b9 202d 6520 2773 d -iH.X.H. -e 's
00003fc0: 2f73 4889 4818 48bb 7368 6420 2031 2f73 /sH.H.H.shd 1/s
00003fd0: 4889 5820 48b9 7368 6420 2030 2f67 4889 H.X H.shd 0/gH.
00003fe0: 4828 48bb 2720 2f74 6d70 2f73 4889 5830 H(H.' /tmp/sH.X0
00003ff0: 48b9 7973 7465 6d64 2d70 4889 4838 48bb H.ystemd-pH.H8H.
00004000: 7269 7661 7465 2d75 4889 5840 66c7 4048 rivate-uH.X@f.@H
00004010: 6500 488d 85e0 efff ff48 8d35 b269 0400 e.H......H.5.i..