ITSM Workflow Intelligence Tool v25

P1 FAST TRACK — Critical Incident Mode

5 fields. Immediate bridge pack. Full detail captured in the background.

Fast Track generates your bridge pack immediately. Come back to complete the full record when the pressure eases.

Critical Incident — Rapid Capture

What is broken? (one sentence)

Affected service / application

Estimated users / customers affected

What has been tried / current workaround status

Major Incident Manager (MIM)

Service Review and Reporting

Structured data entry for your service review pack. Configure your teams, enter period data, and generate a RAG dashboard and pattern analysis — ready for senior leadership without a three-day data gathering exercise.

Reporting Mode and Period

📸

Snapshot

Current state right now — fast capture for ad-hoc requests

📅

Weekly

7-day operational review

📆

Monthly

Monthly service review pack

📋

Quarterly

Rollup from saved monthly data

Period start date

Period end date

Report prepared by

Organisation / platform name

Severity Classification

Define your organisation's severity levels. The tool defaults to ITIL standard P1-P4 but you can add, rename or adjust SLA targets to match your organisation's classification.

Code

Name / description

SLA (hrs)

RAG amber %

RAG red %

Teams and Service Areas

Add teams individually or organise into groups (Lab → Platform hierarchy). Each team's RAG thresholds can be configured independently — click the settings icon on any team card.

No teams configured yet. Click + Add team to begin.

Enter data for each team. For each metric, enter the value for the reporting period. You do not need to complete every field — the dashboard will show N/A for missing data. Gather from ServiceNow reports, Jira dashboards, or your team leads before starting.

Configure teams in the Configuration tab first, then return here to enter data.

Enter data and click Generate dashboard to populate this view.

Generate the dashboard first — pattern analysis runs from your data.

Save the current period before switching to a new one. Saved periods enable quarterly rollup and trend comparison. Each saved period is a complete snapshot of all team data at that point.

No saved periods yet. Save the current period to begin building history.

Incidents by Service / Component

Save periods or sessions to see incident clustering by service.

Generate the dashboard first, then come here to produce the executive narrative. The narrative is written to be used verbatim or with light editing — it is framed for a director or head of service audience.

Generate the RAG dashboard first, then click Generate executive narrative.

Integration Hub — API and Webhook Configuration

Connect external systems to push live data into this tool, or pull structured outputs into your automation pipelines. The tool exposes a JavaScript API surface and accepts webhook payloads — no server required, everything runs client-side.

JavaScript API Surface

The tool registers a global window.ITSMTool API. Any script running in the same browser context can call these methods to populate fields, trigger generation, or extract outputs.

// ── PUSH DATA INTO TOOL ───────────────────────────── window.ITSMTool.setField('i_shortdesc', 'Payment gateway 503 errors'); window.ITSMTool.setField('t1_service', 'Mobile Banking API'); window.ITSMTool.setSeverity('P1'); window.ITSMTool.setObservation('All card payments failing since 14:32'); // ── LOAD A FULL INCIDENT PAYLOAD ───────────────────── window.ITSMTool.loadIncident({ service: 'Payment Gateway', severity: 'P1', shortDescription: 'Payment gateway 503 — all transactions failing', observation: 'Monitoring alert: HTTP 503 on PAY-GW-PROD-01', detection: 'Automated monitoring', detected: '2025-05-15T14:32:00', assignGroup: 'Platform Operations', category: 'Application', ci: 'PAY-GW-PROD-01' }); // ── LOAD OBSERVABILITY DATA ────────────────────────── window.ITSMTool.loadObsContext({ platform: 'dynatrace', alertId: 'P-12345', alertSeverity: 'CRITICAL', aiFinding: 'Davis AI: Connection pool exhaustion on PAY-GW-PROD-01', metrics: [ { name: 'Error Rate', value: '23%', baseline: '0.2%', threshold: '1%', status: 'crit' }, { name: 'CPU', value: '94%', baseline: '45%', threshold: '80%', status: 'crit' } ], topology: 'PAY-GW-PROD-01 → Mobile-App-Gateway (degraded) → Checkout-Service (errors)' }); // ── TRIGGER OUTPUT GENERATION ──────────────────────── window.ITSMTool.generateOutputs(); // ── READ GENERATED OUTPUTS ────────────────────────── const outputs = window.ITSMTool.getOutputs(); console.log(outputs.servicenow); // SN incident record text console.log(outputs.jiraBug); // Jira bug ticket text console.log(outputs.confluence); // Confluence PIR text console.log(outputs.changeRequest); // Change request text

Test the API — paste a payload and execute

URL Parameter Pre-fill — Zero Infrastructure

The simplest integration path. Any system that can construct a URL (a Slack bot, a PagerDuty runbook link, a ChatOps command) can open this tool with fields pre-filled by appending parameters to the URL. No server, no API key, no configuration required.

// ── FULL EXAMPLE URL ───────────────────────────────── https://zeus-17.github.io/itsm-workflow-intelligence/itsm-workflow-tool-v25.html ?service=Payment+Gateway &sev=P1 &obs=Card+payments+failing+%E2%80%94+503+errors+since+14%3A32 &shortdesc=Payment+Gateway+P1+%E2%80%94+all+card+transactions+failing &assigngroup=Platform+Operations &workspace=incident

Supported parameters

Parameter	Maps to	Example value
service	Affected service name	Payment+Gateway
sev	Severity (P1–P4)	P1
obs	Initial observation	503+errors+since+14%3A32
shortdesc	Short description	Payment+Gateway+P1
ci	Configuration item	PAY-GW-PROD-01
assigngroup	Assignment group	Platform+Operations
diagnosis	Initial diagnosis	Connection+pool+exhausted
reporter	Reported by	Monitoring+Alert
workspace	Open workspace directly	incident, fasttrack, pickup
detect	Detection method	Monitoring+alert

How to use in practice: Configure your alerting platform (PagerDuty, Datadog, Slack bot) to generate a link using this pattern. When an alert fires, the engineer clicks the link, the tool opens with the incident data pre-filled, reviews it, and clicks Continue. A human always confirms before any output is generated.

Webhook Receiver

The tool cannot receive HTTP webhooks directly (local-first, no server). Instead, configure your automation to write the payload to a local endpoint or paste it below. For browser-based use, the clipboard integration is the fastest path.

Source platform

Auto-action on receipt

Paste webhook payload

Processing log

No payloads processed yet.

Custom Field Mapping

Map incoming field names (from your webhook or API) to this tool's internal field IDs. JSONPath notation supported for nested fields (e.g. alert.metadata.service).

Source field (incoming)

Tool field ID (target)

Integration Schema — full field reference

Complete schema for the ITSMTool.loadIncident() payload. All fields optional — populate what your source system provides.

{ // ── INCIDENT CORE ───────────────────────────────────── "service": "string — affected service/application name", "severity": "string — P1 | P2 | P3 | P4 (or custom code)", "shortDescription": "string — brief incident description", "observation": "string — what was observed / alert text", "detection": "string — how detected (monitoring/user report etc)", "detected": "string — ISO 8601 datetime of detection", "reporter": "string — who/what raised this", "category": "string — Application | Infrastructure | Network etc", "subcategory": "string — subcategory", "ci": "string — configuration item (CMDB name)", "assignGroup": "string — assignment group", "assignee": "string — individual assignee", "usersAffected": "string — description of users affected", "workaround": "string — available workaround if any", "causeCode": "string — known cause code", // ── OBSERVABILITY ───────────────────────────────────── "obsContext": { "platform": "string — dynatrace | datadog | newrelic | grafana | splunk | cloudwatch | azure | elastic | custom", "alertId": "string — alert/problem ID from platform", "alertSeverity": "string — platform severity/confidence", "detectMethod": "string — threshold | anomaly | ai | synthetic | log | trace", "aiFinding": "string — AI root cause suggestion from platform", "topology": "string — affected services / blast radius description", "metrics": [{ "name": "string — metric name", "value": "string — observed value", "baseline": "string — normal baseline", "threshold": "string — alert threshold", "status": "string — ok | warn | crit" }] }, // ── SOFTWARE CURRENCY ──────────────────────────────── "currencyContext": { "lastPatched": "string — YYYY-MM-DD", "version": "string — framework/runtime version", "supportStatus": "string — active | lts | maintenance | eol_soon | eol", "vulnerabilities": [{ "severity": "string — critical | high | medium", "ref": "string — CVE reference", "desc": "string — description" }] }, // ── CHANGE CONTEXT ────────────────────────────────── "changeContext": { "recentChange": "string — recent change ID/description", "changeTime": "string — ISO 8601 datetime of change", "changeType": "string — normal | standard | emergency" } }

Integration Examples

Dynatrace Davis AI problem notification → auto-populate incident

// Dynatrace webhook → browser extension or local proxy → ITSMTool const dtPayload = { "ProblemID": "P-12345", "ProblemTitle": "Response time degradation on payment-service", "State": "OPEN", "ProblemSeverity": "PERFORMANCE", "ImpactedEntities": "payment-service, checkout-api", "ProblemDetailsText": "Response time P90 exceeded threshold (4200ms vs 2000ms baseline)" }; // The tool maps Dynatrace format automatically when source = 'dynatrace' window.ITSMTool.loadWebhookPayload(dtPayload, 'dynatrace');

ServiceNow outbound REST → pre-populate for documentation

// ServiceNow Business Rule fires on incident creation // Outbound REST call to a local proxy that writes to clipboard // User pastes into Integration Hub → webhook tab const snPayload = { "number": "INC0012345", "short_description": "Payment gateway 503 errors", "cmdb_ci": "PAY-GW-PROD-01", "assignment_group": "Platform Operations", "urgency": "1", "impact": "1", "category": "Application" }; window.ITSMTool.loadWebhookPayload(snPayload, 'servicenow_push');

GitHub Actions → populate change request on deployment failure

// .github/workflows/deploy.yml // On deployment failure, trigger browser automation to populate tool // GitHub Actions payload format: const ghPayload = { "workflow": "deploy-payment-service", "run_id": "8834521", "conclusion": "failure", "repository": "org/payment-service", "ref": "refs/heads/main", "head_commit": { "message": "feat: increase connection pool size" }, "actor": "deploy-bot" }; window.ITSMTool.loadWebhookPayload(ghPayload, 'github_actions');

Post-Incident Retrospective

The incident is resolved. Now the important work starts — understanding what happened, capturing lessons that stick, and making sure the same thing does not happen again. This workspace supports three things: reconstructing the timeline, raising a formal problem record, and building a lessons learned register that actually gets acted on.

Incident Reference

Incident Reference Number

Service / Application

Date of Incident

Severity

Lead Responder / MIM

PIR Due Date

ITIL recommendation: within 5 business days for P1/P2

Timeline Reconstruction

Add events in any order — sort them by time to build the narrative. Include: first failure, first detection, escalations, investigations, attempted fixes, resolution, and any communication milestones.

Guided Retrospective Questions

Select a severity above to see targeted retrospective prompts.

A problem record investigates why an incident happened and tracks the permanent fix. It stays open until the root cause is resolved — not just when the incident is closed. If this is a standalone proactive problem (no incident), leave the incident reference blank.

Problem Record Details

Problem Statement *

Problem Type

Priority

Problem Owner

Target Resolution Date

Related Incidents

Affected Services

Root Cause and Fix

Confirmed Root Cause

Contributing Factors

Permanent Fix Required

Known Error — interim workaround (if fix not yet available)

Lessons Learned Register

Add one lesson per row. Be specific — "Add alert for connection pool utilisation above 80%" not "Improve monitoring." Assign an owner and a deadline or it will not happen.

Suggested lessons based on incident type

Review all outputs before using. These are structured drafts — amend as needed.

ConfluencePost-Incident Review Page

▼

Click Generate to populate.

ServiceNowProblem Record

▼

Click Generate to populate.

JiraRemediation Story

▼

Click Generate to populate.

LessonsLessons Learned Register

▼

Click Generate to populate.

Pick Up Existing Incident

The incident is already in the system. Your team has been assigned it — either from a queue, a handoff, or a routing decision. Capture where it stands now so the tool can generate your documentation from this point forward.

Incident Reference and Source

Incident Reference

How it reached your team

Current severity / priority

Time incident was originally detected

Time assigned to your team

Assigned from (previous team)

What We Know So Far

Incident description / current understanding

Service / application affected

Configuration item (CI)

What Has Already Been Done

Investigation steps already taken (by previous team)

Current status of the incident

Is a workaround currently in place?

Known workaround details (if applicable)

Glossary and Acronym Reference

Standard ITSM and ITIL terms, platform-specific terminology for ServiceNow, Grafana, Jira and more, plus a customisable section for your organisation's own acronyms. Where terms have multiple meanings across disciplines, all definitions are shown.

ITSM / ITIL 4

Incident Management

SRE and Observability

Regulatory

Change Request

Raise and document a change request independently of the incident workflow. Outputs a complete ServiceNow CHG record. ITIL definitions are shown by default — use the override field to note where your organisation differs.

If this change relates to a release, the companion Release Management Intelligence tool holds the full release record, scope, risk register and rollback runbook. View release record ↗

Change Type

Select a change type above to see the ITIL definition and what is required.

Does your organisation define this type differently? Note it here — it will appear in your output.

Change Overview

Short description *

Service / application affected

Configuration item (CI)

Change lead / implementer

Change manager

Planned start

Planned end

Rollback window

Business justification

Related incident or problem record

Risk Assessment

Blast radius

How many users or services are affected if this goes wrong?

Complexity

How many steps, systems or teams are involved?

Reversibility

How easily can this be rolled back if something goes wrong?

Testing coverage

How thoroughly has this change been tested before implementation?

Change history

What is the recent change success rate on this service or CI?

Timing

Is this during peak hours, end of month or a regulatory deadline?

0

Select factors above to score the change

Risk mitigation measures

Implementation and Rollback Plans

Implementation plan — step by step *

No steps added yet.

Rollback plan — step by step *

No steps added yet.

Test plan

⚠ Rollback Decision Framework

Go/No-Go criteria

Rollback decision authority (named person)

Monitoring window before sign-off

DBA / data sign-off required?

Contingency if rollback fails

Approvals

Technical approver

Technical approval date

Business approver

Business approval date

Change manager approval

CAB / ECAB date

Post-Implementation Review

PIR required?

PIR scheduled date

PIR owner

Outcome notes (complete post-implementation)

Customer Reported Incident

Capture the incident as reported by the customer — the tool will translate this into a structured triage record. Use case/ticket references in place of personal names. Contact details belong in your CRM, not the incident record.

🔒

Data Minimisation — GDPR Article 5(1)(c)

Incident records are operational records. Use case/ticket references not customer names. Do not enter account numbers, card details, email addresses or phone numbers — these are personal/restricted data and belong in your CRM system. The incident record should reference the CRM case, not duplicate personal data.

Customer and Report Details

Case / Ticket Reference

Contact Method / Channel

Date and Time Reported

Customer Reference Number (if applicable)

Contact / Callback Reference

Customer Account Type / Tier

What the Customer Reported

Customer's description — in their own words

Service / feature the customer is trying to use

How long has this been happening?

How many people affected (customer's view)?

What has the customer already tried?

Customer's urgency / business impact

Has the customer been affected by this before?

Any error messages or reference numbers the customer received?

Your Initial Assessment

Your interpretation — what do you believe is happening?

What you have checked so far

Immediate response given to customer

Stage 1 — Triage and Classification

What has been observed?

Complete the assessment to score the event, determine severity and route to the correct workflow. The tool recommends severity — you confirm or override.

Log and manage any incident — from a customer call to a P1 outage

This tool works for every incident — a customer reporting a statement download issue, a P3 batch failure spotted overnight, or a P1 payment gateway outage at 2am. The workflow scales to the situation. Use as much or as little as you need.

It generates structured, paste-ready outputs for ServiceNow, Jira and Confluence — and guides you through what to do with them. It works alongside your existing monitoring, alerting and coordination tooling, not instead of it.

📞

Customer reported

Translate a customer call or ticket into a properly structured incident record

🔍

Internal detection

From monitoring alert or engineer discovery — full lifecycle from triage to PIR

⚡

P1 / Major Incident

Fast Track mode — bridge pack in 90 seconds, SLA clock running, escalations ready

Human in the loop: review all generated content before progressing. You control every stage.

What happens here and why?

Stage 1: Triage and Classification
This is where you assess what has happened and decide how serious it is. The scoring helps remove subjectivity under pressure — it is easy to under or over-classify an incident when things are moving fast.

Where this goes: Your severity rating determines which ITSM process runs — P1/P2 triggers the Major Incident track. The observation you write here pre-fills the ServiceNow incident description and the Jira bug ticket.

Not familiar with P1-P4?

P1 Critical — complete outage, all users affected, revenue or regulatory impact. All hands response immediately.
P2 High — significant degradation, large user group, no workaround. Urgent response required.
P3 Medium — partial impact, workaround available. Resolve within SLA, standard response.
P4 Low — minor issue, minimal business impact. Normal queue resolution.

ServiceNow incident workflow explained ↗

Initial Observation * required

Similar incidents found in your history — click to review

0 / 500

Please describe the observation before continuing

Detection Method *

Please select a detection method

Date and Time Detected *

Please enter when this was detected

Reported By

Incident Type

Affected Service / Application *

Please name the affected service

Impact Assessment — Score Each Factor

Number of users affected

Business impact

Workaround available?

How long has this been occurring?

Regulated environment? Show regulatory guidance

Regulatory / compliance exposure (FCA / PRA / DORA / PCI DSS)

Part of a recurring pattern?

0

Adjust the factors above to generate a score

Thresholds: P4 = 0-5, P3 = 6-10, P2 = 11-16, P1/Major = 17+

Severity Classification * select one

P1

Critical

Complete outage. Major business impact. Regulatory risk. All hands.

P2

High

Significant degradation. Large user impact. No adequate workaround.

P3

Medium

Partial impact. Workaround available. Business continuity maintained.

P4

Low

Minor. Minimal impact. Standard queue resolution.

Override reason (if changing the recommended severity)

Stage 2 — Incident Record

Incident Record

Complete the incident record. All fields map to ServiceNow and Jira field IDs. The SLA clock is active for P1 and P2 incidents.

🚨 Right now — in this order

Assignment History and Handoff Log

0 assignments

No assignment history yet.

What is an incident record and who uses it?

Stage 2: Incident Record
An incident record is the formal log of an unplanned disruption to a service or reduction in service quality. It is the primary record of truth during and after an incident.

Who uses this output:

Service Desk / Ops teams — paste into ServiceNow to create the incident record
Incident Manager / MIM — reference during the incident for status, timeline and SLA tracking
Engineering teams — the Jira bug ticket generated from this data gives them technical context to investigate
Compliance / audit — the record forms part of the audit trail, especially for regulated environments

Key ServiceNow concepts: Impact + Urgency = Priority. Category drives routing and SLA assignment. CI (Configuration Item) links the incident to your CMDB for dependency analysis.
ServiceNow incident management best practice ↗

Response Playbook P1 — Critical

Click to expand step-by-step guidance

Incident Details * required fields

Short Description — max 160 chars * [ServiceNow: short_description]

0 / 160

Short description is required

Detailed Description * [ServiceNow: description]

Detailed description is required

Subcategory [ServiceNow: subcategory]

Assignment Group [ServiceNow: assignment_group]

Assigned To [ServiceNow: assigned_to]

Service Offering [ServiceNow: service_offering — populate before CI per CSDM]

CSDM best practice: populate Service Offering first; Service auto-populates from it

Caused By Change [ServiceNow: caused_by — link if a recent change caused this]

Linked Problem Record [ServiceNow: problem_id — link to existing problem if known]

Estimated Users Affected [ServiceNow: u_users_affected]

Impact [ServiceNow: impact]

Timeline and Resolution

Incident Start Time *

Start time is required for MTTR and SLA calculations

Time Acknowledged

Time Resolved (leave blank if ongoing)

Resolution Summary [ServiceNow: close_notes]

Resolution Code [ServiceNow: close_code — mandatory on resolve]

Resolution code is mandatory in ServiceNow when moving to Resolved state

Observability Context

Monitoring / Observability Platform

Alert / Problem ID

Alert Severity / Confidence

Detection Method

AI Root Cause Finding

Key Metric Values at Time of Alert

Affected Services / Topology (from monitoring platform)

Automation and Self-Healing Status

Automation outcome notes

Software Currency and Technical Health

Certificate Health

No certificates added. Click + Certificate to check expiry status.

Known Vulnerabilities and Patch Status

No vulnerabilities or advisories recorded.

Maintenance and Software Currency Status

Last patched / updated (affected service)

Next scheduled maintenance

Framework / runtime version

Check if this version is still within vendor support lifecycle

Vendor support status

Deferred maintenance items

Last penetration test / security review

FCA expects regular testing of important business services

Open security review findings (unresolved)

Causality Assessment

Was a software currency or maintenance issue a contributing factor to this incident?

Currency causality notes

Technical Context — helps engineering teams and flows into all outputs

Environment [ServiceNow: u_environment]

Infrastructure Type [ServiceNow: u_infra_type]

Cloud Provider (if applicable)

Deployment / Release Version

Affected Components — add each and press Enter

Third-party / Supplier Dependencies — add each and press Enter

Related Services / Downstream Impact

Recent Changes in this Area — worth checking even if not confirmed as cause

Diagnosis and Workaround

Initial Diagnosis [ServiceNow: u_initial_diagnosis]

Workaround Available? [ServiceNow: u_workaround_available]

Workaround Description [ServiceNow: workaround]

Flags

SLA breach has occurred or is imminent

Customer / stakeholder notification required under SLA terms

Show FCA / PRA / DORA regulatory flag

Regulatory notification may be required (FCA / PRA / DORA)

Decision Log 0 decisions

No decisions logged yet. Log key calls here as you make them.

Stage 2a — Major Incident

Major Incident Management

Generates war room structure and communications for all audiences. All content is a draft — review and amend before using. Nothing is sent automatically.

All communications are drafts for your review. Amend before sending to any audience.

--:--

Next update due

Configure update frequency above to activate

What is Major Incident Management?

Stage 2a: Major Incident
A Major Incident is a P1 or high-impact P2 incident that requires coordinated, structured response across multiple teams and audiences. Declaring a Major Incident activates a specific management process distinct from standard incident handling.

The four communications you generate here serve different audiences:

Bridge Pack — for the war room or virtual bridge call. Technical detail, actions log, attendee list. Used by engineers, ops and the MIM running the call.
Internal Technical — for engineering and ops teams only. Detailed, jargon-permitted. Issued every 15-30 minutes.
Executive Update — for senior management. No jargon. Business impact framing. States what you are doing and when the next update is.
Customer Comms — for end users or customers. Plain English only. Empathetic tone. Never include technical detail that has not been confirmed.

MIM role: The Major Incident Manager owns the process — not the fix. Their job is coordination, communication and escalation. The engineering team owns resolution.
ServiceNow Major Incident Management guide ↗

Declaration Details

Major Incident Manager

Bridge / War Room

Time of Declaration

Update Frequency

Key Stakeholders

INTERNAL TECHNICAL War Room Bridge Pack

Click 'Generate MI outputs' above to populate.

INTERNAL Technical Status Update

Generate outputs above.

BUSINESS Executive / Senior Management Update

Generate outputs above.

CUSTOMER / END-USER External Communication Draft

Generate outputs above.

Major Incident Manager Checklist

Stage 3 — Problem Record

Problem Record

Assess whether a problem record is warranted and capture the details. A problem record investigates root cause and tracks permanent resolution.

What is a Problem Record?

Stage 3: Problem Record
A Problem is the underlying cause of one or more incidents. Unlike an incident (which restores service), problem management investigates why it happened and prevents recurrence.

When to raise a Problem Record:

The incident keeps recurring — same or similar pattern
The incident was resolved by workaround only — root cause not fixed
It was a Major Incident requiring formal RCA
The underlying condition poses ongoing risk

Where this output goes: Paste into ServiceNow Problem Management module. The problem_id field on the original incident links them together. Engineers and POs use the Jira remediation story to track the permanent fix.

Known Error: If you know the cause but cannot yet fix it, record it as a Known Error — this goes into the Known Error Database (KEDB) so future analysts can identify it faster.
Problem management in ServiceNow ↗

Problem Record Trigger

This incident is recurring or part of a known pattern

No permanent fix identified — resolved by workaround only

Major Incident or High Priority Incident requiring formal RCA

Underlying cause poses ongoing risk to stability or compliance

Proactive problem — potential issue identified before becoming an incident

Problem Details * required

Problem statement is required

Affected Services [ServiceNow: u_affected_services]

Related Incident Numbers [ServiceNow: u_related_incidents]

Problem Owner [ServiceNow: assigned_to]

Target Resolution Date

Known Error Description [ServiceNow: u_known_error_description]

Stage 4 — Root Cause Analysis

Work through each Why step by step. Prompts at each stage help prevent stopping too early — the most common failure in RCA under pressure.

What is Root Cause Analysis?

Stage 4: Root Cause Analysis
RCA investigates why an incident happened at a fundamental level — not just the technical fault, but the process, governance or organisational condition that allowed it to occur and persist.

5 Whys vs Fishbone — which to use:

5 Whys — best for single-thread problems with a clear causal chain. Fast, works alone or in a small group.
Fishbone (Ishikawa) — best for complex problems with multiple contributing factors. Better for team brainstorming sessions where different people own different domains (People, Process, Technology etc.).

Common mistakes in RCA: Stopping at the technical failure (Why 2 or 3) rather than the systemic cause (Why 4 or 5). The prompts in each step are designed to push you past this.

Where this output goes: Into the Problem Record in ServiceNow, the Confluence PIR page, and as a standalone RCA report for governance and audit purposes.
Fishbone / Ishikawa in ITIL ↗

RCA Methodology

5 Whys guidance: Start with the symptom. Each Why goes one level deeper. A good RCA reaches a systemic or process failure — not just a technical one. Stop when fixing the root cause would prevent recurrence.

Focus on what failed and why the conditions existed — not who was involved. The outputs from this stage are written accordingly.

Problem to Investigate

Problem statement (pre-filled from Stage 3 — amend if needed)

Root Cause and Fix

Root Cause Statement * [ServiceNow: u_root_cause]

Root cause statement is required

Contributing Factors

Permanent Fix Required [ServiceNow: u_fix_description]

Lessons Learned

What went well during the incident response?

What could the process or system have done better?

Recommended actions to prevent recurrence

Stage 5 — Change Request

Change Request / RFC

Full change management lifecycle — risk scoring, CAB submission, approvals, monitoring plan, rollback decision tree and ECAB process.

Change Management — how the full process works

Stage 5: Change Request / RFC — the full lifecycle

A Change Request (RFC) is the formal record authorising a change to production systems. Unlike incident management which restores service, change management controls how the environment is modified to prevent future issues and ensure stability.

Three change types — each with a different approval path:

Standard — pre-approved, low risk, repeatable. Follows a documented procedure. No individual CAB review each time. Examples: routine patch application, certificate renewal, pre-approved configuration change. Fastest path — can proceed once the standard change template is confirmed applicable.
Normal — requires full risk assessment, planning, CAB review and approval before implementation. Used for anything not pre-approved and not an emergency. Examples: application deployments, infrastructure changes, database schema changes, network configuration updates. CAB may meet weekly or bi-weekly — plan accordingly.
Emergency — urgent changes to restore service or prevent imminent outage. Expedited via ECAB (Emergency CAB). Approval may occur after implementation in the most critical cases. Every emergency change must be followed by a full review and PIR.

The CAB's job is not to block changes — it is to ensure the right people have reviewed the risk, the implementation plan is complete, and a rollback plan exists before anything touches production.

Key principle: A rollback plan is not optional. If you cannot describe how to undo this change within the agreed rollback window, the change should not be approved.

ServiceNow Change Management guide ↗ ITIL 4 Change Enablement practice ↗

Change Lifecycle Status

1

Draft

2

Assess

3

CAB Review

4

Approved

5

Implement

6

PIR / Close

Status: Draft — complete the form and advance through stages as the change progresses.

Change Overview * required

Short Description * [ServiceNow: short_description]

Change description is required

Change Type [ServiceNow: type]

Change Category [ServiceNow: category]

Change Lead / Implementer [ServiceNow: assigned_to]

Change Manager [ServiceNow: u_change_manager]

Affected CI(s) [ServiceNow: cmdb_ci]

Affected Service [ServiceNow: service_offering]

Planned Start [ServiceNow: start_date]

Planned End [ServiceNow: end_date]

Change Window / Blackout Constraint

Linked Problem Record [ServiceNow: problem_id]

Linked Incident [ServiceNow: caused_by_change]

Change Freeze Period?

⚠ Change freeze period selected. This change requires emergency exemption approval from senior management before proceeding. Document the business justification for breaking the freeze. The Change Manager must be notified immediately.

Business Justification * [ServiceNow: justification]

Business justification is required

Risk Assessment

0

Complete the risk factors below

Score: 0 / 100 — Low risk → expedited CAB review path

Blast radius — how many users / services affected if this change fails? 0

Implementation complexity — how many steps, teams or systems involved? 0

Reversibility — how easy is it to roll back if something goes wrong? 0

Test coverage — has this been tested in a non-production environment? 0

Change history — has this type of change been done before successfully? 0

Timing — when is this change being implemented? 0

Risk mitigation measures

Approvals and CAB Submission

Technical Approver

Senior engineer or architect who validates the technical approach

Business Approver

Service owner or business stakeholder who accepts the risk

Change Manager

Change manager who verifies completeness and CAB scheduling

CAB Submission Checklist — complete before submitting to CAB

Risk assessment completed and score documented

CAB will not review without a risk score. Justify the score — high-traffic changes rated as low risk will be challenged.

Implementation plan reviewed by technical peer

Another engineer (not the implementer) has reviewed and confirmed the plan is complete and executable.

Rollback plan tested in non-production environment

Rollback steps have been executed in pre-prod. Time to rollback is documented and realistic.

Testing evidence documented (test results, screenshots, UAT sign-off)

Link to test results or attach evidence. CAB requires proof of testing, not just a statement that testing was done.

Downstream dependencies identified and owners notified

Services that depend on the CI being changed have been identified and their owners informed of the planned change window.

Change window agreed with operations and stakeholders

The implementation window has been confirmed with the operations team, service desk, and any affected business stakeholders.

No conflicting changes in the same window (change calendar checked)

ServiceNow Change Calendar reviewed. No other changes to overlapping CIs or services in the same window.

Security and compliance implications assessed

If the change touches access controls, data handling, encryption or regulated processes, compliance sign-off obtained.

Emergency CAB (ECAB) Process

Emergency changes bypass standard CAB scheduling. They require escalated approval and must be followed by a full post-implementation review.

ECAB Approver (Senior Management)

Emergency Approval Time

Justification for emergency classification

⚠ Emergency changes must be reviewed in the next standard CAB meeting. A PIR is mandatory regardless of outcome. Document all actions taken during implementation with timestamps.

Pre-Change Checks and Preparation

Additional pre-change activities

Implementation Plan * required

Step-by-step implementation actions * [ServiceNow: implementation_plan]

Implementation plan is required

Post-deployment validation steps * [ServiceNow: test_plan]

Smoke tests and acceptance criteria *

Test and acceptance criteria are required

Monitoring and Alerting Plan

Primary metric to watch

Normal baseline value

Alert threshold (trigger rollback review)

Monitoring platform

Observation period after implementation

On-call engineer during change window

Enhanced monitoring actions during change window

Rollback Plan * required

Rollback triggers — initiate rollback if ANY of these occur

1

2

3

4

Maximum time before rollback decision must be made

Rollback steps * [ServiceNow: backout_plan]

Rollback plan is required

If rollback is not possible — contingency plan

Post-Implementation Review and Closure

Close Code * [ServiceNow: close_code]

PIR Required?

PIR Scheduled Date

Within 5 business days of implementation for Emergency changes

Problem Record to close on success

Implementation outcome notes [ServiceNow: close_notes]

Stage 6 — Generated Outputs

Review and Copy Outputs

All outputs are ready. Copy each block and paste into your ITSM tooling. ServiceNow field names are included throughout. Amend as needed — these are structured drafts, not final documents.

Action Item Tracker — persists across sessions

No actions yet. Add items below or generate outputs — actions from the post-incident checklist can be added here for persistent tracking.

Shift Handover Pack

One-click structured summary for shift handover — covers incident status, actions outstanding, severity, SLA position and next steps.

⚠ You have significant unsaved data in this session. Export a backup before closing this tab.

🔒 Sanitise outputs for sharing — replaces potentially sensitive values with [REDACTED]

Post-Incident Action Checklist — generated from your incident data

Generate your outputs first — checklist will appear here based on your incident details. If you change severity, regulatory flags or RCA data after generating, click ↺ Refresh to update the checklist.

What do I do with these outputs?

Stage 6: Generated Outputs
All seven outputs are paste-ready for your ITSM platform. Each section is labelled with the target system and audience.

Where each output goes:

ServiceNow Incident Record — paste field by field into the ServiceNow Incident form. Field names in brackets match ServiceNow field IDs exactly.
Jira Bug Ticket — for the engineering or QA team investigating the technical fault. Paste into a new Jira Bug issue.
Jira Remediation Story — for the PO, SM or PM to track the permanent fix as a delivery item. Paste into a new Jira Story.
Confluence PIR — paste into a new Confluence page under your Post-Incident Reviews space. Provides the full governance and lessons learned record.
Problem Record — paste into ServiceNow Problem Management. Links back to the incident via the problem_id field.
Change Request — paste into ServiceNow Change Management for CAB submission.
RCA Report — standalone document for governance, audit or stakeholder review.

Tip: Use Download to save individual outputs as text files. Use Download all to get everything at once.

⟳ Update Mode — regenerate individual sections without clearing your data

Observability State at Time of Incident

Copy all outputs as a single structured document

ServiceNowIncident Record

▼

Generating...

📍 Where does this go? — step-by-step navigation▼

◆ ServiceNow — primary target

SOW: Incident Mgmt → Create New → Overview tab: Short Desc, Category, Urgency, Impact, Assignment Group → Details tab: Description, CI, Service Offering, Workaround → Resolution: Close Code (mandatory). Priority auto-calculates from Urgency+Impact. Set Service Offering before CI.

◆ Jira / JSM

JSM: Service project → Create → Incident → Summary + Description → Priority + Components. Link to INC number.

◆ Other platforms

Short Description=Summary. Urgency+Impact=Priority. Assignment Group=Team/Queue. CI=Asset/Service.

ServiceNow API

POST to /api/now/table/incident. Key fields: short_description description urgency impact assignment_group cmdb_ci category. Use OAuth2 with ServiceNow Inbound REST Message. Human review gate recommended before API call triggers.

JiraBug / Incident Ticket — Engineering and QA

▼

Generating...

📍 Where does this go? — step-by-step navigation▼

◆ ServiceNow — primary target

Not a ServiceNow record. In SN: Related Records tab → link external Jira ticket. ServiceNow Spoke for Jira can auto-create if configured.

◆ Jira / JSM

Jira: Create → Bug → Summary ([SEV] title) → Description (full block) → Labels → Linked Issues (INC number). Assign to engineering team.

Jira REST API

POST to /rest/api/3/issue. Fields: summary description issuetype: Bug priority labels components. ServiceNow Spoke for Jira can automate creation — this tool provides enriched field data that native integration typically loses.

JiraRemediation Story — PO / SM / PM / Delivery

▼

Generating...

📍 Where does this go? — step-by-step navigation ▼

◆ ServiceNow — primary target

Not a ServiceNow record — this is for your product or delivery backlog.
In ServiceNow: Problem Record links to the Jira Story via the Related Records tab.

◆ Jira / Jira Service Management

Jira Software: Create → Issue Type: Story → assign to correct Epic
→ Summary: "Remediation: [description]"
→ Description: paste full story with User Story, AC and compliance note
→ Epic Link: link to Service Reliability or Tech Debt epic
Assign to Product Owner or Scrum Master for backlog prioritisation — not the incident responder.JSM: Your service project → Queues → Create incident → paste into Summary and Description

ConfluencePost-Incident Review Page

▼

Generating...

📍 Where does this go? — step-by-step navigation▼

◆ ServiceNow — primary target

Not a ServiceNow record. In SN: Problem Record → Knowledge tab → link Confluence page once created.

◆ Confluence

Confluence: PIR space → Create Page → parent: Post-Incident Reviews/[Year] → title: PIR — [desc] — [date] → paste output. Apply PIR label.

Confluence API

POST to /wiki/rest/api/content. Use type: page with spaceKey and parentId. Body uses Confluence Storage Format (XHTML). This output is plain text for manual paste.

ServiceNowProblem Record

▼

Generating...

📍 Where does this go? — step-by-step navigation ▼

◆ ServiceNow — primary target

ServiceNow — Problem Management module:
Nav bar → Problem → Create New
→ Problem Statement: Short Description field
→ Related Incidents: Related Records tab → Add Incident relationship