Unlocking Fleetspeak Large Scale and Reliability with Cloud Spanner
Unlocking Fleetspeak Large Scale and Reliability with Cloud Spanner
Authored by Frank Tobia, Ike Okoro, Matt Pfeiffer and Dan Aschwanden
Spanner isn't just another database; it's Google Cloud's globally distributed, strongly consistent database service. It's built from the ground up for massive scale and high availability, making it an ideal fit for the rigorous demands of a large Fleetspeak deployment. For Fleetspeak, this integration means saying goodbye to some of the most significant scaling bottlenecks.
Here’s the high-level view of why Spanner is a game-changer for Fleetspeak:
- Exceptional Performance: Spanner is engineered to handle immense throughput. This means that even with the hundreds of thousands of messages common for a large fleet, Fleetspeak can efficiently ingest and process data, leading to quicker client interrogations and faster command execution. It is a step change in how responsive your GRR client management becomes.
- Bulletproof Stability & Reliability: In incident response, system stability is non-negotiable. Spanner's core architecture provides inherent high availability and robust disaster recovery capabilities. This gives Fleetspeak a dramatically more reliable foundation, letting you worry less about database bottlenecks or even outages impacting your critical operations.
- Scale to 50,000+ Clients: This integration enables the capability to support GRR/Fleetspeak deployments scaling up to 50,000 clients and beyond. This is an important improvement for large organizations that previously hit scaling limits with traditional databases like the MySQL based datastore that Fleetspeak supports for a long time. Note that Fleetspeak’s MySQL datastore option (either on CloudSQL or self-hosted) is not going away and is still available to you in case you operate GRR on a smaller scale.
- Flexible Configuration: The config.proto file was updated to include Spanner as a first-class, configurable datastore option. This allows Fleetspeak deployments to easily select Spanner via their components_config (see a sample configuration snippet below). The core components.go file was refactored to handle selecting either MySQL (the old default) or the new Spanner based datastore configuration.
- Optimized Schemas: New database schemas were designed specifically to leverage Spanner's unique capabilities, ensuring efficient data storage and retrieval at scale.
- Core Datastore Refactoring: Critical components responsible for data persistence – the stores – were substantially updated. This involves new implementations for clientstore.go, filestore.go, messagestore.go, and broadcaststore.go to reliably interact with the Spanner datastore for managing client data, file metadata, messages, and broadcasts.
Ready to deploy Fleetspeak with Spanner? The process is well-defined and starts with preparing your Google Cloud environment:
- Create Your Spanner Instance: You'll need to create a Spanner Instance in your Google Cloud project before setting up Fleetspeak. The setup.sh script within the Fleetspeak repository will then handle creating the default fleetspeak Database and its necessary Tables within that datastore.
- Set up Pub/Sub: Fleetspeak's Spanner integration also relies on Google Cloud Pub/Sub. For this reason you'll need to create a dedicated Pub/Sub Topic and Subscription. This system ensures that backlogged messages are processed efficiently by triggering the ProcessMessages() method.
- Configure Fleetspeak: Update your Fleetspeak components_config file (see the sample snippet above). You'll need to provide your Google Cloud Project ID, Spanner Instance and Database names, and the names of your Pub/Sub Topic and Subscription. This configuration points Fleetspeak to your new, powerful Spanner datastore implementation. Note that the configuration now requires that you remove the MySQL datasource name (mysql_data_source_name) in favour of the Spanner configuration in lieu.
You will have to provision the Spanner and Pub/Sub Google Cloud resources for each GRR/Fleetspeak deployment:
These resources require that the Spanner API and the Pub/Sub API are enabled in the Google Cloud Project that runs the Google Kubernetes Engine (GKE) that hosts the Fleetespeak workloads.
To access both the Spanner and the Pub/Sub resources the Fleetspeak workloads on GKE will require the following roles:
Spanner
Cloud Spanner Database User (roles/spanner.databaseUser)
Pub/Sub
Pub/Sub Publisher (roles/pubsub.publisher) on the Topic
Pub/Sub Subscriber (roles/pubsub.subscriber) on the Subscription
We recommended that the roles be granted leveraging Workload Identity Federation for GKE (WIF). Note that WIF allows for narrowing the scope of the assigned roles to both Kubernetes Service Accounts and Kubernetes Namespaces. This allows for narrow control over the permissions and enables a focused least privilege approach to assign the required permissions.
This major enhancement, finalized with the merge of PR #561, marks a new chapter for large-scale Fleetspeak deployments. It promises a significantly smoother, more performant, and fundamentally more scalable and stable experience.
For more detailed technical information on setting up and configuring Spanner for Fleetspeak, refer to the SPANNER.md documentation file in the Fleetspeak GitHub repository.
And last but not least, for a podcast version of this blog post you can listen in here.
Stay tuned as we continue to explore these powerful tools! We are also working on porting the GRR datastore to Spanner to allow your GRR instance to scale to even larger client fleets.
As always, the GRR user group is the place for questions and collaboration.
You can also revisit our previous posts for related GRR/Fleetspeak content:
The links- The Fleetspeak repository: https://github.com/google/fleetspeak
- The Fleetspeak Spanner ReadMe: https://github.com/google/fleetspeak/blob/master/spanner-setup/SPANNER.md
- The GRR repository: https://github.com/google/grr
- The GRR website: https://www.grr-response.com
Comments
Post a Comment