Generate your own hash sets with HashR

The HashR team is pleased to announce the first public release of HashR. HashR extracts files from a source and uploads hashes, metadata and the actual content of the files to a given data sink. By doing that it allows you to generate your own hash sets, which then you can use during Blue Team operations, by extracting and hashing the actual files from complex data sources like physical or cloud disk images. There are many hash set providers out there (e.g. NSRL), but shared hash sets have some limitations:

  • They come in different formats, contain different metadata and use different hashing algorithms so are difficult to use consistently.

  • They are updated infrequently, e.g. every couple of months.

  • By definition they don’t provide the actual content of the file, complicating further investigation once a matching hash is identified.

  • The hashed files are only from public sources, if your organisation is using custom base OS images then you won’t find your hashes in those sets.

HashR has a different approach to hash sets by allowing you to generate your own hash sets from complex data sources and keeping them up to date without any manual attention. It removes the burden from dealing with various data formats (e.g. disk images) and maintaining private hash sets. Besides that using HashR has the following benefits:

  • It allows you to create private hash sets from an internal source without exposing any of the data to the public.

  • It’s open source, you know how the files are extracted and hashes and metadata are calculated, it’s not a black box solution.

  • Provides lower latency versus public hash sets, it allows us to keep your hash sets up to date within a day. 

  • It can run on schedule and automatically import data that was not yet processed and upload it to the data sink.

  • Allows you to upload content of the actual files in case you want to feed them to another system for further processing/analysis. 

An input source can be as simple as a collection of .tar.gz archives or a more complex data source like a Google Compute Engine (GCE) or raw disk image with multiple volumes and file systems. Under the hood HashR utilises Plaso to do the heavy lifting of parsing disk images, volumes, file system and other complex data formats. 

The modular design of HashR allows the implementation of additional importers and exporters without worrying about the core functionality. At present, HashR has importers for the following:

  • GCE Cloud disk images

  • Windows OS installers in ISO format 

  • Windows OS update files 

  • .tar.gz archives 

Once files are extracted and hashes calculated they are passed to exporters which upload them to data sinks. Currently a PostgreSQL exporter is included, which stores hashes, metadata and content in a PostgreSQL Database.

Although this tool was primarily designed to help digital forensic analysts during incident response it can help with various use cases:

  • Digital Forensics teams can use this data to filter and reduce the number of files and events they have to look at while doing forensic analysis. It can help you to show all the binaries and files that are not part of the base OS image or known updates. 

  • Detection teams can depending on the context suppress certain types of alerts and reduce the number of false positive detections.

  • Threat intelligence and detection teams can use hash sets generated by HashR to test new detection signals and signatures before pushing them to production. 

  • Any other teams that need to extract files from complex and multi-layered data formats and upload them to a given data sink might benefit from using HashR.

Instructions on how to set up your own HashR instance can be found on Github, if you have ideas for new importers, exporters or other functionality please open a feature request in the HashR repo. 

In the next article we are going to go through how to make use of hash sets generated with HashR to speed up forensic analysis while using Timesketch.


Popular posts from this blog

Parsing the $MFT NTFS metadata file

Incident Response in the Cloud

Container Forensics with Docker Explorer