How do you know you are "Ready to Respond"?
How do you know you are "Ready
to Respond"?
The Continuous Improvement Framework - A framework designed to help improve a team’s response readiness through data driven actions
Authors: Angelika Rohrer, Jon Brown
Contributors: Joachim Metz
January 2024
___
About this paper
What is the CI Framework?
Introduction
What does “Ready to Respond” mean?
Measuring Response Readiness
Continuous Improvement (CI) Framework
Benefits
How do you implement the CI Framework?
So, where do you start?
1. Response Strategy
2. Critical Phases
3. Measurements and Metric Selection (KPIs)
4. Procedural Health Assessment
5. Gap Analysis Report & Planning input
Conclusion
Appendix
Appendix A: CI Framework - Response Strategy Categorisation Template
Appendix B: CI Framework - Sample Evaluation Phishing
Appendix C: CI Framework - Sample Response Category Catalog
Appendix D: CI Framework - Sample Gap Analysis Report
___
About this paper
In this paper we are introducing the idea of a “Continuous Improvement (CI) Framework”, which enables an organization to self-assess the health of the underlying operational infrastructure that needs to be in place for an incident response team to be effective. The central focus of the CI Framework is maintaining and maturing the level at which an organization is "Ready to Respond" to any given type of incident.
-----------------------------
Choice of words and phrases
- CI - Continuous Improvement
- CMMI - Capability Maturity Model Integration
- Frameworks are seen as tools to solve big-picture problems. Frameworks should be used as a tool to create a common language that other organizations, including customers and regulators, can understand when wanting to learn more about an organization’s security posture (see Trapped in a frame).
- Incident Response (IR) is a structured, well documented, and formalized strategic approach to respond to an incident with the goal to limit or prevent damage to an organization and remediate the cause to reduce the risk of future incidents. IR is part of the broader Incident Management (IM) process and focuses on handling technical tasks and considerations. In our case, the term “incident” refers to security incidents such as cybersecurity threats, data breaches or system failures.
- KPI - Key performance indicator
- Operations or operational work describes ongoing, often repetitive, activities that need to be completed to keep the Incident Response Team’s lights on. Activities include administration, training, process documentation, system, tooling and lab maintenance.
- SLA - Service Level Agreement
- SLO - Service Level Objective
-----------------------------
NOTE: In the context of existing Cyber Security Incident Response Team (CSIRT) Security Maturity Models, the CI Framework can be compared with SIM3 v2 interim Self Assessment Tool, section P-8: Audit and Feedback Process which "describes how the CSIRT assesses their set-up and operations by self-assessment, external or internal assessment and a subsequent feedback mechanism. Those elements considered not up-to-standard by the CSIRT and their management are considered for future improvement." [see SIM3-mkXVIIIc]
----------------------------- ⏺ ⏺ ⏺ -----------------------------
What is the CI Framework?
Introduction
Within the hectic, reactive world of Incident Response (IR) the key to be effective is meticulous preparation and planning as outlined in Data Incident Response Process and Building Secure and Reliable Systems. Effective incident response teams are always dedicated to learning from every incident. IR Teams use findings to improve their incident handling, and are always on the lookout for ways to implement additional preventive measures. Striving for continuous improvement in this area can feel like a full-time job. IR Teams recognize that the available tools, capabilities, and processes are often good enough, but they could always be better. However, improving on existing operational infrastructure and procedures, response capabilities and partnerships, is often an afterthought and done spur of the moment. It often feels like maturing underlying operational processes has less priority than responding to an incident, doing research, or taking part in exercises, tabletops, and training - and yet, not maturing the operational infrastructure can lead to a lag in general team preparedness.
The CI Framework's strength lies in its ability to assess the current level of a team’s IR readiness and evaluate their preparedness to effectively respond to potential major incidents. It considers various factors related to the health and maturity of essential operational infrastructure, including playbooks, partnerships, and tools. By doing so, the framework provides insights into the team's ability to efficiently handle future incidents.
What does “Ready to Respond” mean?
The term "Ready to Respond" can hold various meanings for different teams. In the context of this paper, it means the IR team has not only the ability but also the necessary, well maintained operational infrastructure and healthy resources available, to successfully engage in managing an incident.
Some examples of what this looks like is:
- Incidents have a well defined escalation path;
- Playbooks exist and are up to date, reviewed regularly and gaps are known;
- Critical tools are always available and capable of handling any incident type;
- Partnerships with essential stakeholders for an incident are clearly identified (i.e. legal counsel, communications department, etc.);
- Teams operate at a high level and meet their SLA/SLOs and/or KPIs;
- After an incident, if needed, follow up to verify root cause analysis and action item completion.
Measuring Response Readiness
Anyone that has attempted to measure the success of incident response within the fast-paced, reactive security landscape likely came to the conclusion that this is a difficult problem to solve with no readily available "one-size-fits-all" solution. Numerous industry security maturity models exist that can assess an organization's overall security posture to determine whether the organization has an adequate security management program in place. Some great examples are NIST 800-61, ISO 27035, SIM3 and COBIT. All of these examples have one thing in common: they focus on how well a team performs, the effectiveness of individual responders, or the time to recovery. Measuring performance provides insights on what went well and what went wrong while working an incident. However, it does not give detailed insights into the state of underlying infrastructure. The team will not most likely not be able to easily answer questions such as:
Are we ready to respond to the next big incident? Are we ready for issues that happen infrequently?
Are our processes sufficiently up to date for new regulations?
Continuous Improvement (CI) Framework
One way of getting efficient answers to these questions is to implement the Continuous Improvement Framework. This framework enables creating an accurate, comprehensive, and holistic picture of capability and process health. Creating a clear picture of which areas to invest, what projects to prioritize and where to allocate bandwidth and budget.
The CI Framework is designed to track and measure capability and process health to ensure the team is ‘Ready to Respond’.
The framework categorizes incident response efforts into clearly defined response strategies and critical phases common to all response strategies. Specifically selected points of measurement (KPIs) highlight gaps and areas of improvements within the IR team’s operational infrastructure which will help to mature overall response readiness.
The diagram above “Phishing Response”, “Malware Response”and “Ransomware Response” are response strategies that consist of multiple critical phases such as “intake”, “playbooks” and “tools”.
Benefits
Once strategies and phases are identified, the CI Framework is easily set up and it does not require much maintenance bandwidth.
Benefits include:
- It helps the organization understand and improve response readiness and capabilities, regardless of the type or severity of the security incident to manage.
- It measures and tracks the health of processes, tooling, and response strategies over time.
- It creates a scalable and flexible way to onboard new response categories. Categories and phases can be freely chosen based on specific needs or data at hand.
- Over time the measurements can be refined and will expose additional gaps which in turn can be prioritized and fixed at an appropriate pace.
- Additional points of measurement can be added at any time.
- It simplifies planning and bandwidth prioritization by providing a big picture view and highlighting gaps and shortcomings.
- It helps to improve process resilience by identifying single points of failure.
- It allows Incident Commanders, Responders and Security Engineers to have a clear vision of the impact of their actions and roadmaps.
----------------------------- ⏺ ⏺ ⏺ -----------------------------
How do you implement the CI Framework?
So, where do you start?
The Continuous Improvement Framework is designed as a long-term goal that can be achieved by making regular incremental changes. The Plan-Do-Check-Act (PDCA) project management approach is a great supporting tool to highlight the individual steps necessary for implementation.
The full setup cycle, as shown in Figure 1, only needs to be performed once in its entirety. Upon defining response strategies, identifying critical phases, and selecting appropriate measurements, the primary focus of the ongoing time investment should shift to consistently completing the procedural health assessments, conducting gap analysis, and planning and prioritizing projects. The average time investment for the setup cycle is 30-60 minutes per strategy per quarter.
IR Teams can always onboard or offboard new response categories, critical phases, and measurements at any time to increase the depth of the analysis as well.
Figure 1 - The CI Framework setup cycle
Response Strategy
Start by creating a Response Strategy Catalog (overview) tracking all defined response strategies. Bucket individual response strategies by impact or common threat, ensuring all cases that fall into the same category are of similar nature. Start with high level categories like malware, phishing, fraud, ransomware and then refine in later iterations.
Example: Creating a Response Strategy Catalog
Result (Malware response strategy with 3 specific sub-categories):
Once all individual response strategies have been categorized, formally document the process. For each strategy, provide a definition, clearly define its scope, establish exit criteria, and assign a primary owner or subject matter expert. The owner will be able to assist initially, potentially owning the full assessment and planning of their strategy once they are comfortable with the process.
With this structured approach, subject matter experts can easily add new response strategies or refine existing ones.
Note: The chosen strategies will be different based on the organization
Critical Phases
The next step, after defining response strategies, is to identify critical phases that need to be completed during each response, regardless of the response strategy.
For the purpose of measuring response readiness, consider the following critical phases:
Measurements and Metric Selection (KPIs)
Now select appropriate data points with the goal to measure the health of each critical phase. For a first iteration, hereafter also referred to as “V1”, focus on existing and easily available data points like "Does a playbook exist?" and "Is it up to date?." Depending on what data points are already recorded, the used measurements may be either very detailed and elaborate or may start off as a simple checklist.
Note: Adjust these measurements to fit your specific needs or metrics available.
Over time, the chosen measurements can be refined and tweaked to expose more granular shortcomings (gaps). These gaps can then be fixed at an appropriate pace.
After the selection of appropriate data points, the next step is to measure the health of each identified data point. For measuring each data point, we chose the Capability Maturity Model Integration (CMMI) framework. However, any other predefined, industry standard maturity model is equally suitable.
Using the CMMI as reference, the given answers from the “Measurements and Metric” section can then be easily converted to a simple score of 1-5 as shown in the Procedural Health Assessment section below.
Note: The measurement scale itself will not change over time, how the KPI per "critical phase" will change upwards or downwards once you start adding more data points to measure and/or fix identified gaps.
Procedural Health Assessment
Now that the initial setup of the CI Framework is done, the evaluation and assessment part of the CI Framework starts. The simplest approach is to ask the owner of each response strategy category to help with the evaluation, assessment and scoring. For V1, shown below in Figure 2, keep track of capabilities that are working well, but also highlight areas that are broken or missing.
Figure 2: Results of the V1 evaluation and assessment
The selected measurements and data points can be refined any time. Figure 3 below shows a refined next iteration (V2) of the phishing example. Certain scores were adversely affected by recently added measurements, elevating the priority for resolving them promptly.
Figure 3: Results of the V2 evaluation and assessment
New Response Strategies can quickly be evaluated and documented.
Additional points of measurement can be added at any time.
Note: In case additional measurements are needed, run the new evaluation for at least one cycle alongside the old version, comparing the results that use the same measurement version to reflect progress made accurately.
Gap Analysis Report & Planning input
Prioritize the identified gaps based on the risk they pose to successfully respond to an incident. Use this prioritized list to present the results to your major stakeholders. The results provide a big picture overview of the IR Team’s overall incident response health and serve as input for management planning. The IR team and/or management can explore the list of identified gaps and determine whether they are already covered by ongoing efforts or need to be evaluated for future work. The results of the gap analysis can also be integrated alongside other ongoing work.
The results of the CI assessments provide key insights into the following questions about the organizations response readiness:
Conclusion
In uncertain landscapes with ad-hoc decisions and reactive workflows, maintaining a stable foundation for a consistently ready response team is challenging.
The Continuous Improvement Framework offers a unified perspective on response health and operational maturity, enabling informed resource investments and risk assessments. It prevents surprises and minimizes mishandling risks that could harm reputation and finances.
The framework's simplicity, flexibility, and scalability make it applicable to various areas, including response/investigation strategies, partnerships, and tooling resilience.
In Summary: The CI Framework outlines the high-level purpose of all ongoing work, categorizes it and measures its impact. Through the defined KPIs it allows for correct project prioritization helping you to reach annual objectives, which ultimately help to achieve your mission to “always remain ready to respond”.
Note: In case we piqued your interest and you want to try it out for your team, we have included several templates and examples to get you started.
----------------------------- ⏺ ⏺ ⏺ -----------------------------
Comments
Post a Comment