Operational Professionalizing vs Proceduralizing

As a Security Operations team grows and matures, repeatable outcomes and standards become increasingly important over time. It’s natural that many kinds of work which were once ad-hoc begin to need defined procedures. Perhaps the company becomes covered by a regulation which requires these specific procedures to exist, or perhaps management has aligned itself to a framework which recommends them. Whatever the reason, as a result, the team begins writing playbooks, processes, and holding themselves to account for following those. But what happens when this effort goes too far past professionalizing the team and becomes proceduralizing?

As a long-time volunteer Fire & Rescue responder I have been in close contact with professional responders for decades. Years ago I heard from peers about a pilot experiment to improve patient outcomes for Emergency Medical responses. Each medic was given an iPad with patient-tracking software on it. The idea was that, on scene and en-route to the hospital, digital medical records would be created and updated. At the hospital they’d be transferred to the doctors and information about en-route treatments and discoveries would be easily available to staff. Later, researchers would comb through outcome data to determine which pre-hospital interventions were most effective, improving medical care for everyone.

On the ground, the medics hated the “stupid iPads”. While everyone recognized the importance of studying outcomes to improving patient care, the actual software was cumbersome. Touchscreens were difficult to use with gloves on (especially once bodily fluids were involved) and messing with the checkboxes and data inputs was seen as actively interfering with patient care. Many paramedics complained that they “spent more time with the iPad than with the patient”. The upset caused by the situation was compounded by the medics feeling like they were in a no-win choice between letting the patient down (spending too much time on the iPad) or getting yelled at later by management for not tracking the right metrics at the right time.

While the intention of this project was to help professionalize the industry by collecting important data and using that data to improve patient care, the inadvertent outcome was to frustrate the responders and reduce the quality of care.

What is professionalizing?

Professionalizing your team is a natural outcome of growth and can help deal with many of the root causes of burnout in Response teams. A professional team that is held to documented standards has less uncertainty and knows what they must do to meet them. A professional team has a mission and can consistently deliver it to their organization. It partners well with stakeholders and upholds the kind of work and reputation that draw respect with executives, in court, and in the field at large.

When pushing to professionalize an operations team, you might write documentation to help guide investigations - for example, the collection of investigative questions curated by the DFIQ project. These sets of formal question-and-answer style investigative guides are designed to provide outcome-focused hints and procedures to help answer common investigative questions. DFIQ is a good step toward professionalizing because it doesn’t force an analyst to perform specific tasks, but lists particular investigative techniques that can help deliver a desired outcome.

Metrics can help with professionalizing as well. A professional team typically delivers an SLA - for example, triage all cases within 30 minutes of escalation. But how do you know you’re delivering that without being able to measure your team’s performance? A professional team tracks important signals about the work quality in order to put resources where they’re needed most and make the best use of the resources given.

Professionalizing your operational practices also includes professionalizing your communications as covered previously in professional communications.

What is proceduralizing?

In a drive for leadership to measure outcomes and provide consistency, it can become very easy to treat every inconsistency in ops or every regular human mistake as one more thing that needs to go into a playbook to be followed. This often starts with good intentions. Blaming the human for mistakes is something we try to avoid in a good postmortem culture. If something was a simple mistake, why not put that step into a playbook so no one will miss it next time? Checklists work great for aviation, so why wouldn’t DFIR investigators work the same?

Proceduralizing is how I refer to the outcome of an uncontrolled effort, over time, to address operational quality with an increasing number of procedures to follow. Without careful attention, it’s possible that one day you may look around and every aspect of the job has a defined task in a playbook that must be done a specific way, and especially your senior people suffer from BoreOut! Remember that you hired your DFIR professionals to deal with situations where organizational processes failed in the first place.

With the best intentions, leadership and even the team itself can strip away autonomy from the responders doing the work. Over time this can cause the responders to feel demotivated by the apparent lack of trust in their expertise. They’ll likely ask themselves - why would someone insist on putting huge playbooks in place and require that every step be followed, unless that someone simply doesn’t trust the team to make good decisions?

Responders chafe under too-specific guidance or at receiving critical feedback for seemingly inconsequential data management oversights during an otherwise interesting case. How would it feel to spend hours cracking the mystery of a serious-seeming intrusion signal and for the only feedback to be questioning why the incident metadata field didn’t have the ‘Root cause?’ field checked?

In addition to the negative morale aspects, I’ve also noticed that the more detailed playbooks become, the more difficult it is for staff to step back and think critically about a case in front of them. The expectation to ‘always consult the playbook’ is frustratingly easy to become ‘only follow the playbook’ as daily repetition builds muscle memory and common case types become a matter of routine rather than each being its own unique investigation.

Normalization of deviance is an extremely difficult thing to avoid completely, but leaders should look for the right balance between professionalism and standards-setting while steering away from being too prescriptive at all times for all cases. A set of motivated and engaged responders with a human error budget is, generally, preferable to the world’s most complete playbook and a checked-out analyst following it.

One exciting near-future possibility for minimizing proceduralization while improving consistency is moving routine analyst tasks into AI assistant tools, if you can ensure reliability in the results. The AI tool never becomes bored-out or loses focus!

Maintaining the balance

Inside my response team, we have used two separate types of documents: a response plan and a playbook.

Inside a response plan you’ll find big, broad guidance about how to handle a type of situation. For example, a “Suspected intrusion into a production environment” response plan might list different teams to contact and add into the Incident Command structure and have their oncall aliases listed. It might remind the responders which legal teams and comms teams to pre-brief, etc.

In contrast, a playbook (or runbook) is typically written to provide specific instructions on a component of the work you might not do often, which isn’t situationally dependent, and which has important steps that are easy to forget. For example, “Acquiring data from the reiserfs filesystem” is not a common task, but having written a playbook on how to do so ensures that whatever oncall analyst might have to do so has correct guidance to remind them of some of the sharp edges.

The middle ground between a vague but overarching response plan and a proscriptive but very limited playbook is a place where you maximize the agency and expertise your responders bring to the table. Playbooks and guidance can be written to take into account common situations that might occur in your environment and provide helpful reminders of tools that exist, techniques that may come into play, etc. You can even include checklists as helpful reminders, or to provide more explicit guidance for more junior up-and-coming responders.

Your first priority for playbooks should be reinforcing for everyone that the playbook steps are, like the Pirate’s Code, a guideline and not a rule. While the junior analysts might adhere pretty closely to them while learning, your experienced analysts should feel free to bypass or alter any step at any time, while documenting in the case log why they did so. The explanation doesn’t have to be extensive - a few key points will do. This documentation will help ensure case reviews still focus on quality outcomes and can also help improve playbooks in the future.

For metrics, similar principles apply. The goal of metrics and tracking things like ‘root cause’ should be to help the organization apply incident information to improving the systems. To avoid getting lost in collecting metrics for metrics’ sake, have a written triage guide for deciding when and how to track a particular metric. A good triage doc asks a few questions, and each must have a concrete answer as to how this metric will meaningfully improve something, to consider a metric useful enough to ask responders to spend time tracking it. (I’ll have more to say about tracking metrics in a future post - stay tuned!). A wise manager of mine said in the past that “The best metrics are derived, not manually tracked.” In other words, to the greatest extent possible you should measure by measuring what has happened, not by forcing staff to stop what they’re doing and collect data instead.

Conclusion

Keeping a team of DFIR professionals engaged at high levels of performance while remaining accountable to your organization (and customers, and regulators!) is a difficult and complex task that requires some amount of metrics-gathering and standards-setting. To get the best out of your team, ensure that you’re doing this thoughtfully and with a plan for how each metric or standardization will empower the individuals to do their best work, rather than drag them down into a morass of box-checking or toil. To really maximize their cost vs benefit, try to select metrics that can be derived from information generated in the course of the work or automatically observed, rather than collected from humans or subjectively tracked. Finally, ensure that your program for following up on operational consistency focuses on the ways that tooling and automation can support the analyst in their performance of the job and applying their expertise to the role, rather than adding more steps for them to remember to follow or growing the size of a checklist. Your business and your responders will thank you!

Open Source DFIR