How do you know you are "Ready

to Respond"?

The Continuous Improvement Framework - A framework designed to help improve a team’s response readiness through data driven actions

Authors: Angelika Rohrer, Jon Brown

Contributors: Joachim Metz

January 2024

___

About this paper

What is the CI Framework?

Introduction

What does “Ready to Respond” mean?

Measuring Response Readiness

Continuous Improvement (CI) Framework

Benefits

How do you implement the CI Framework?

So, where do you start?

1. Response Strategy

2. Critical Phases

3. Measurements and Metric Selection (KPIs)

4. Procedural Health Assessment

5. Gap Analysis Report & Planning input

Conclusion

Appendix

Appendix A: CI Framework - Response Strategy Categorisation Template

Appendix B: CI Framework - Sample Evaluation Phishing

Appendix C: CI Framework - Sample Response Category Catalog

Appendix D: CI Framework - Sample Gap Analysis Report

___

About this paper

In this paper we are introducing the idea of a “Continuous Improvement (CI) Framework”, which enables an organization to self-assess the health of the underlying operational infrastructure that needs to be in place for an incident response team to be effective. The central focus of the CI Framework is maintaining and maturing the level at which an organization is "Ready to Respond" to any given type of incident.

-----------------------------

Choice of words and phrases

- CI - Continuous Improvement

- CMMI - Capability Maturity Model Integration

- Frameworks are seen as tools to solve big-picture problems. Frameworks should be used as a tool to create a common language that other organizations, including customers and regulators, can understand when wanting to learn more about an organization’s security posture (see Trapped in a frame).

- Incident Response (IR) is a structured, well documented, and formalized strategic approach to respond to an incident with the goal to limit or prevent damage to an organization and remediate the cause to reduce the risk of future incidents. IR is part of the broader Incident Management (IM) process and focuses on handling technical tasks and considerations. In our case, the term “incident” refers to security incidents such as cybersecurity threats, data breaches or system failures.

- KPI - Key performance indicator

- Operations or operational work describes ongoing, often repetitive, activities that need to be completed to keep the Incident Response Team’s lights on. Activities include administration, training, process documentation, system, tooling and lab maintenance.

- SLA - Service Level Agreement

- SLO - Service Level Objective

-----------------------------

NOTE: In the context of existing Cyber Security Incident Response Team (CSIRT) Security Maturity Models, the CI Framework can be compared with SIM3 v2 interim Self Assessment Tool, section P-8: Audit and Feedback Process which "describes how the CSIRT assesses their set-up and operations by self-assessment, external or internal assessment and a subsequent feedback mechanism. Those elements considered not up-to-standard by the CSIRT and their management are considered for future improvement." [see SIM3-mkXVIIIc]

----------------------------- ⏺ ⏺ ⏺ -----------------------------

What is the CI Framework?

Introduction

Within the hectic, reactive world of Incident Response (IR) the key to be effective is meticulous preparation and planning as outlined in Data Incident Response Process and Building Secure and Reliable Systems. Effective incident response teams are always dedicated to learning from every incident. IR Teams use findings to improve their incident handling, and are always on the lookout for ways to implement additional preventive measures. Striving for continuous improvement in this area can feel like a full-time job. IR Teams recognize that the available tools, capabilities, and processes are often good enough, but they could always be better. However, improving on existing operational infrastructure and procedures, response capabilities and partnerships, is often an afterthought and done spur of the moment. It often feels like maturing underlying operational processes has less priority than responding to an incident, doing research, or taking part in exercises, tabletops, and training - and yet, not maturing the operational infrastructure can lead to a lag in general team preparedness.

The CI Framework's strength lies in its ability to assess the current level of a team’s IR readiness and evaluate their preparedness to effectively respond to potential major incidents. It considers various factors related to the health and maturity of essential operational infrastructure, including playbooks, partnerships, and tools. By doing so, the framework provides insights into the team's ability to efficiently handle future incidents.

What does “Ready to Respond” mean?

The term "Ready to Respond" can hold various meanings for different teams. In the context of this paper, it means the IR team has not only the ability but also the necessary, well maintained operational infrastructure and healthy resources available, to successfully engage in managing an incident.

Some examples of what this looks like is:

- Incidents have a well defined escalation path;

- Playbooks exist and are up to date, reviewed regularly and gaps are known;

- Critical tools are always available and capable of handling any incident type;

- Partnerships with essential stakeholders for an incident are clearly identified (i.e. legal counsel, communications department, etc.);

- Teams operate at a high level and meet their SLA/SLOs and/or KPIs;

- After an incident, if needed, follow up to verify root cause analysis and action item completion.

Measuring Response Readiness

Anyone that has attempted to measure the success of incident response within the fast-paced, reactive security landscape likely came to the conclusion that this is a difficult problem to solve with no readily available "one-size-fits-all" solution. Numerous industry security maturity models exist that can assess an organization's overall security posture to determine whether the organization has an adequate security management program in place. Some great examples are NIST 800-61, ISO 27035, SIM3 and COBIT. All of these examples have one thing in common: they focus on how well a team performs, the effectiveness of individual responders, or the time to recovery. Measuring performance provides insights on what went well and what went wrong while working an incident. However, it does not give detailed insights into the state of underlying infrastructure. The team will not most likely not be able to easily answer questions such as:

Are we ready to respond to the next big incident? Are we ready for issues that happen infrequently?

Are our processes sufficiently up to date for new regulations?

Continuous Improvement (CI) Framework

One way of getting efficient answers to these questions is to implement the Continuous Improvement Framework. This framework enables creating an accurate, comprehensive, and holistic picture of capability and process health. Creating a clear picture of which areas to invest, what projects to prioritize and where to allocate bandwidth and budget.

The CI Framework is designed to track and measure capability and process health to ensure the team is ‘Ready to Respond’.

The framework categorizes incident response efforts into clearly defined response strategies and critical phases common to all response strategies. Specifically selected points of measurement (KPIs) highlight gaps and areas of improvements within the IR team’s operational infrastructure which will help to mature overall response readiness.

The diagram above “Phishing Response”, “Malware Response”and “Ransomware Response” are response strategies that consist of multiple critical phases such as “intake”, “playbooks” and “tools”.

Benefits

Once strategies and phases are identified, the CI Framework is easily set up and it does not require much maintenance bandwidth.

Benefits include:

- It helps the organization understand and improve response readiness and capabilities, regardless of the type or severity of the security incident to manage.

- It measures and tracks the health of processes, tooling, and response strategies over time.

- It creates a scalable and flexible way to onboard new response categories. Categories and phases can be freely chosen based on specific needs or data at hand.

- Over time the measurements can be refined and will expose additional gaps which in turn can be prioritized and fixed at an appropriate pace.

- Additional points of measurement can be added at any time.

- It simplifies planning and bandwidth prioritization by providing a big picture view and highlighting gaps and shortcomings.

- It helps to improve process resilience by identifying single points of failure.

- It allows Incident Commanders, Responders and Security Engineers to have a clear vision of the impact of their actions and roadmaps.

----------------------------- ⏺ ⏺ ⏺ -----------------------------

How do you implement the CI Framework?

So, where do you start?

The Continuous Improvement Framework is designed as a long-term goal that can be achieved by making regular incremental changes. The Plan-Do-Check-Act (PDCA) project management approach is a great supporting tool to highlight the individual steps necessary for implementation.

The full setup cycle, as shown in Figure 1, only needs to be performed once in its entirety. Upon defining response strategies, identifying critical phases, and selecting appropriate measurements, the primary focus of the ongoing time investment should shift to consistently completing the procedural health assessments, conducting gap analysis, and planning and prioritizing projects. The average time investment for the setup cycle is 30-60 minutes per strategy per quarter.

IR Teams can always onboard or offboard new response categories, critical phases, and measurements at any time to increase the depth of the analysis as well.

Figure 1 - The CI Framework setup cycle

Response Strategy

Start by creating a Response Strategy Catalog (overview) tracking all defined response strategies. Bucket individual response strategies by impact or common threat, ensuring all cases that fall into the same category are of similar nature. Start with high level categories like malware, phishing, fraud, ransomware and then refine in later iterations.

Example: Creating a Response Strategy Catalog

Iteration 1 (What type of response?): Compromise		Iteration 2 (What type of compromise?): Malware

Impact:	Security breach	Impact:	Security breach / compromise
Definition:	Attacks targeting infrastructure, services, products, users or devices	Definition:	Malicious software targeting infrastructure, services, products, users or devices
In Scope:	Malware, Social Engineering, Credential Theft, …	In Scope:	Malicious Apps, Man-in-the-Middle, third party software issues, Mobile phone malware, …

Result (Malware response strategy with 3 specific sub-categories):

Impact	Response Strategy		PoC
	Main	Sub
Security Breach	Malware Response	Android Malware	Rob
		Third party software issue	Derek
		Malicious Apps	Tarik
Abuse	Cloud Response	Coinmining	Jason
		Ransomware	Ollie

Once all individual response strategies have been categorized, formally document the process. For each strategy, provide a definition, clearly define its scope, establish exit criteria, and assign a primary owner or subject matter expert. The owner will be able to assist initially, potentially owning the full assessment and planning of their strategy once they are comfortable with the process.

Example Phishing
Definition:	phishing attacks within your organization
In Scope:	email phishing, spear phishing, whaling
Out of Scope:	bad marketing / recruiting practices
Edge Cases:	smishing and vishing
Exit Strategy:	mitigation and remediation completed

With this structured approach, subject matter experts can easily add new response strategies or refine existing ones.

Note: The chosen strategies will be different based on the organization

Critical Phases

The next step, after defining response strategies, is to identify critical phases that need to be completed during each response, regardless of the response strategy.

For the purpose of measuring response readiness, consider the following critical phases:

Intake	Quality / accuracy and completeness of information reported. Is this information routed to the most appropriate team with the fewest handoffs possible?
Playbooks	Playbooks (or response plans or runbooks) are written, reviewed and have all necessary features to enable responders to consistently achieve their goals.
Tools and Capabilities	Tools support responders to efficiently and consistently respond to the incident type.
Partnerships / Stakeholders	Responders know appropriate internal partner teams or points of contacts to engage throughout an incident. Partner teams are trained in participating in an incident response.
Operational Excellence	Operational SLAs and SLOs (or other organizational measures like KPIs) are met.
Aftercare	If a post mortem was needed, action items are clearly identified and brought to conclusion. Trends and common (root) causes identified and actioned.

Measurements and Metric Selection (KPIs)

Now select appropriate data points with the goal to measure the health of each critical phase. For a first iteration, hereafter also referred to as “V1”, focus on existing and easily available data points like "Does a playbook exist?" and "Is it up to date?." Depending on what data points are already recorded, the used measurements may be either very detailed and elaborate or may start off as a simple checklist.

Note: Adjust these measurements to fit your specific needs or metrics available.

Phase	Example V1 Measurement
Intake	Does a defined escalation path exist? Does an automated email to ticket system exist?
Playbooks	Does a playbook exist?
Tools and Capabilities	Are critical tools available? Are critical capabilities available?
Partnerships	Are support partners identified Are escalation path(s) in place?
Operational Excellence	Are processes completed within SLA?
Aftercare	Was a post mortem written?

Over time, the chosen measurements can be refined and tweaked to expose more granular shortcomings (gaps). These gaps can then be fixed at an appropriate pace.

Phase	Example V2 Measurement
Intake	Does a defined escalation path exist? Does an automated email to ticket system exist? How often are tickets misrouted?
Playbooks	Does a playbook exist? Is it up to date?
Tools and Capabilities	Are critical tools available? Are critical capabilities available?
Partnerships	Are support partners identified Are escalation path(s) in place? Does a hand-off process exist?
Operational Excellence	Are processes completed within SLA?
Aftercare	Was a post mortem written? Was a trend analysis completed? Are essential AIs identified?

After the selection of appropriate data points, the next step is to measure the health of each identified data point. For measuring each data point, we chose the Capability Maturity Model Integration (CMMI) framework. However, any other predefined, industry standard maturity model is equally suitable.

Using the CMMI as reference, the given answers from the “Measurements and Metric” section can then be easily converted to a simple score of 1-5 as shown in the Procedural Health Assessment section below.

Note: The measurement scale itself will not change over time, how the KPI per "critical phase" will change upwards or downwards once you start adding more data points to measure and/or fix identified gaps.

Procedural Health Assessment

Now that the initial setup of the CI Framework is done, the evaluation and assessment part of the CI Framework starts. The simplest approach is to ask the owner of each response strategy category to help with the evaluation, assessment and scoring. For V1, shown below in Figure 2, keep track of capabilities that are working well, but also highlight areas that are broken or missing.

Example Evaluation: Phishing
Phase	V1 Measurement	Evaluation	Known Gaps	KPI
Intake	Does a defined escalation path exist? Does an automated email to ticket system exist?	Yes No	Automated ticketing system missing	3
Playbooks	Does a playbook exist?	Yes		4
Tools and Capabilities	Are critical tools available? Are critical capabilities available?	Partially Yes	Some minor tooling features to obtain full automation are missing	4
Partnerships	Are support partners identified Are escalation path(s) in place?	Partially Yes		4
Operational Excellence	Are processes completed within SLA?	Partially	Process Z is broken and causes delays	2
Aftercare	Was a post mortem written?	Yes		4

Figure 2: Results of the V1 evaluation and assessment

The selected measurements and data points can be refined any time. Figure 3 below shows a refined next iteration (V2) of the phishing example. Certain scores were adversely affected by recently added measurements, elevating the priority for resolving them promptly.

Example Evaluation: Phishing
Phase	V2 Measurement	Evaluation	Known Gaps	KPI
Intake	Does a defined escalation path exist? Does an automated email to ticket system exist? How often are tickets misrouted?	Yes No 23%	Automated ticketing system missing Misrouting due to missing ticketing system	2
Playbooks	Does a playbook exist? Is it up to date?	Yes Yes		4
Tools and Capabilities	Are critical tools available? Are critical capabilities available?	Partially Yes	Some minor tooling features to obtain full automation are missing	4
Partnerships	Are support partners identified Are escalation path(s) in place? Does a hand-off process exist?	Partially Yes No	Missing hand off process / playbook	3
Operational Excellence	Are processes completed within SLA?	Partially	Process Z is broken and causes delays	2
Aftercare	Was a post mortem written? Was a trend analysis completed? Are essential Action Items identified?	Yes No No	Process to write trend analysis missing Action Item and Feature tracking process incomplete	2

Figure 3: Results of the V2 evaluation and assessment

New Response Strategies can quickly be evaluated and documented.
Additional points of measurement can be added at any time.

Note: In case additional measurements are needed, run the new evaluation for at least one cycle alongside the old version, comparing the results that use the same measurement version to reflect progress made accurately.

Gap Analysis Report & Planning input

Prioritize the identified gaps based on the risk they pose to successfully respond to an incident. Use this prioritized list to present the results to your major stakeholders. The results provide a big picture overview of the IR Team’s overall incident response health and serve as input for management planning. The IR team and/or management can explore the list of identified gaps and determine whether they are already covered by ongoing efforts or need to be evaluated for future work. The results of the gap analysis can also be integrated alongside other ongoing work.

1st Evaluation, March 2023, V1 Measurement
Response Strategy	Intake	Play- books	Tools & Capabilities	Partners	Ops Excellence	Post Mortems	Notes
Phishing	3	4	3	3	2	3	GAPS - Intake: Ticketing system missing - Operation Excellence: Process Z broken - Partners: Missing hand off process - Post-mortems: Trend analysis missing - Post-mortems: Action Item tracking incomplete
Malware	2	3	3	4	2	3
Ransome	2	4	3	2	3	5

2nd Evaluation, September 2023, V1 Measurement
Response Strategy	Intake	Play- books	Tools & Capabilities	Partners	Ops Excellence	Post Mortems	Notes
Phishing	3	4	3	4	3	4	GAPS - Intake: Ticketing system missing - Post-mortems: Action Item tracking incomplete FIXED - Partners: Hand off process launched - Operational Excellence: Process Z fixed - Post-mortems: Trend analysis implemented
Malware	3	3	3	4	2	4
Ransome	3	4	3	3	3	5

The results of the CI assessments provide key insights into the following questions about the organizations response readiness:

How well prepared are we to respond?
What are our biggest gaps and how critical are those?
Are we ready to respond to the next big incident?
Are we ready for issues that happen infrequently?
Do we set the right priorities for ongoing project and tooling work?
Is bandwidth for project work allocated correctly?
Are our processes sufficiently up to date for new regulations?

Conclusion

In uncertain landscapes with ad-hoc decisions and reactive workflows, maintaining a stable foundation for a consistently ready response team is challenging.

The Continuous Improvement Framework offers a unified perspective on response health and operational maturity, enabling informed resource investments and risk assessments. It prevents surprises and minimizes mishandling risks that could harm reputation and finances.

The framework's simplicity, flexibility, and scalability make it applicable to various areas, including response/investigation strategies, partnerships, and tooling resilience.

In Summary: The CI Framework outlines the high-level purpose of all ongoing work, categorizes it and measures its impact. Through the defined KPIs it allows for correct project prioritization helping you to reach annual objectives, which ultimately help to achieve your mission to “always remain ready to respond”.

Note: In case we piqued your interest and you want to try it out for your team, we have included several templates and examples to get you started.

----------------------------- ⏺ ⏺ ⏺ -----------------------------

Appendix

Appendix A: CI Framework - Response Strategy Categorisation Template

Security Response Continuous Improvement Framework

Response Classification $Template

Status: Draft | Authors: {...} | Version: 1.1 | Last update:

Response Strategy Definition: $ Name 1

Incident Response Evaluation 2

1st Evaluation, V1, {Date} 2

Response Strategy Definition: $ Name

This is what the response is about

	Description	Notes / Reasoning
Response Strategy Scope	Describe the bounds within incident is handled
Edge Cases	What are one off scenarios that might require additional support
Exit / de-escalation / transition criteria	The point when incident can be close or transitioned to another team

Incident Response Evaluation

How well are we responding?

1	Initial / Ad hoc	Completely missing, no instructions or predefined partner team to help
2	Managed	Would work with extreme effort, no documentation, unclear partners, minimal critical artifacts/logs/processes available
3	Defined	Slow and manual effort, some documentation, some partnerships, some critical artifacts/logs/processes available
4	Quantitatively Managed	Slow but automated, fairly well documented, strong partnerships, most critical artifacts/logs /processes available
5	Optimizing	Fully automated and documented, strong partnerships, has specialized tools / processes, all critical artifacts/logs/processes available, business as usual

1st Evaluation, V1, {Date}

Category	Measure V1	KPI	Notes / Reasoning
Intake	- (100%) How often are tickets misrouted? * currently only 1 measurement, answer makes 100% of the KPI	NA
Playbooks	- (50%) Does Playbook exist? - (50%) Is it up to date? * 2 measurements available, both answers are equally important, answer weight split evenly and makes 50% of the KPI	NA	Link to Playbook: Up to date: Yes / No
Tools and Capabilities	- (50%) Are critical tools available? - (50%) Are critical capabilities available?	NA	Critical tools: Yes / No / Partially Tool 1 Tool 2 Critical capabilities: Yes / No / Partially Capability 1
Support Partnerships ( Contributors to an Incident)	- (60%) Are support partners identified? - (40%) Are escalation paths in place? * 2 measurements available, measurement 1 is considered more important, receives higher weight in % contributing to the overall KPI	NA	RACI chart: Yes / No / Partially Escalation path up to date: Yes / No / Partially
Ops Excellence	- (100%) Are tickets within our defined SLA’s/SLOs?	NA	SLA: Yes / No / Partially SLO: Yes / No / Partially
Post Mortems (PM)	- (100%) Was a PM written?	NA	PM written: Yes / No
Other Examples
Example Training	- (40%) Do we have incident specific training? - (20%) Do we have a test plan? - (20%) Was a "lessons learned" doc created after the end of the test? - (20%) Has training been completed within the last 6 month?	NA	Training in place: Yes / No / Partially Test plan: Yes / No / Partially "Lessons learned": Yes / No / Partially Training completed in time: Yes / No / Partially

Appendix B: CI Framework - Sample Evaluation Phishing

Security Response Continuous Improvement Framework

Response Classification Phishing

Status: Draft | Authors: {...} | Version: 1.1 | Last update:

Response Category Definition: Phishing Response 1

Incident Response Evaluation 1

1st Evaluation, V1 - September 2023 2

Response Strategy Definition: Phishing Response

This is what the response is about

	Description	Notes / Reasoning
Response Strategy Scope	Describe the bounds within incident is handled	phishing attacks within your organization email phishing, spear phishing, whaling Out of Scope: bad marketing / recruiting practices
Edge Cases	What are one off scenarios that might require additional support	smishing and vishing
Exit / de-escalation / transition criteria	The point when incident can be close or transitioned to another team	mitigation and remediation completed

Incident Response Evaluation

How well are we responding?

1	Initial / Ad hoc	Completely missing, no instructions or predefined partner team to help
2	Managed	Would work with extreme effort, no documentation, unclear partners, minimal critical artifacts/logs/processes available
3	Defined	Slow and manual effort, some documentation, some partnerships, some critical artifacts/logs/processes available
4	Quantitatively Managed	Slow but automated, fairly well documented, strong partnerships, most critical artifacts/logs /processes available
5	Optimizing	Fully automated and documented, strong partnerships, has specialized tools / processes, all critical artifacts/logs/processes available, business as usual

1st Evaluation, V1 - September 2023

Category	Measure V1	KPI	Notes / Reasoning
Intake	- (40%) Does a defined escalation path exist? - (40%) Does an automated email to ticket system exist? - (20%) Are tickets misrouted?	3	Escalation Path: YES Ticket System: YES Misrouting: 23% of the time
Playbooks	- (50%) Does Playbook exist? - (50%) Is it up to date?	4	Link to Playbook: YES Up to date: YES
Tools and Capabilities	- (50%) Are critical tools available? - (50%) Are critical capabilities available?	3	Critical tools: Partially Critical capabilities: Partially
Support Partnerships ( Contributors to an Incident)	- (60%) Are support partners identified? - (40%) Are escalation paths in place?	3	RACI chart: Yes Escalation path up to date: No
Ops Excellence	- (100%) Are tickets within our defined SLA’s/SLOs?	3	SLA: No SLO: Partially
Post Mortems (PM)	- (100%) Was a PM written?	3	PM written: Not always

Appendix C: CI Framework - Sample Response Category Catalog

Impact	Response Strategy			PoC
Impact	Main	Sub	Definition	PoC
Security Breach	Malware Response	Android Malware		Alice
		Third party software issue		Bob
		Malicious Apps		Cedric
Abuse	Cloud Response	Coinmining		John
Abuse	Cloud Response	Ransomware		Nicole

Appendix D: CI Framework - Sample Gap Analysis Report

Impact	Response Strategy		Point of Contact	Ticket #	Ø Strategy Health Score	Intake	Play- books	…	GAPS	Scoring changes
	Main	Sub				Ø Score	Ø Score
	Main	Sub				Ø	Ø
Data					Ø all critical phases
Abuse					Ø all critical phases
Security Breach					Ø all critical phases

Open Source DFIR

How do you know you are "Ready to Respond"?

About this paper

What is the CI Framework?

Introduction

What does “Ready to Respond” mean?

Measuring Response Readiness

Continuous Improvement (CI) Framework

Benefits

How do you implement the CI Framework?

So, where do you start?

Response Strategy

Critical Phases

Measurements and Metric Selection (KPIs)

Procedural Health Assessment

Gap Analysis Report & Planning input

Conclusion

Appendix

Appendix A: CI Framework - Response Strategy Categorisation Template

Appendix B: CI Framework - Sample Evaluation Phishing

Appendix C: CI Framework - Sample Response Category Catalog

Appendix D: CI Framework - Sample Gap Analysis Report

Comments

Post a Comment

Popular posts from this blog

Parsing the $MFT NTFS metadata file

Incident Response in the Cloud

Container Forensics with Docker Explorer