Firefighter Mode

Firefighter Mode is a specialized agent for investigating and fixing production incidents. It combines Linear ticket triage, Slack channel monitoring, Bugsnag error tracking, and Datadog observability into a single automated loop — designed for on-call engineers who want an AI copilot watching production alongside them.

How It Works

                    ┌─────────────────┐
                    │  Polling Loop    │
                    │  (every 5 min)   │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
     Linear Triage    Slack Channels   Bugsnag/Datadog
     Queue (Part 1)   (Part 3)         (Part 2)
              │              │              │
              └──────────────┼──────────────┘
                             ▼
                    Extract Context
                    (Datadog MCP tools)
                             │
                             ▼
                    Spawn Sub-Agent
                    (isolated worktree)
                             │
                   ┌─────────┴─────────┐
                   ▼                   ▼
             Attempt Fix          Generate Report
             Run Tests            Update Linear
             Create PR            Reply in Slack

The agent runs a 5-minute polling loop that checks three sources for new issues, then investigates and attempts fixes automatically.

Time Window

The agent doesn't scan all historical data. It uses a bounded time window:

First check — looks back at most 24 hours before session start to catch any recent/active issues
Subsequent checks — only looks at data since the last check (roughly 5 minutes), so each poll is fast and focused on what's new

This means you can start a session, get a quick catch-up on anything actively broken, and then the agent shifts to watching for new problems in real time.

Dynamic MCP Tool Discovery

The system prompt automatically adapts to which MCP servers you have configured. If you don't have the Slack MCP server installed, the agent won't reference Slack tools or try to use them. Only configured tools appear in the prompt — no more "Slack MCP is unavailable" errors.

The monitoring prompt also respects this: Slack channel monitoring instructions are only included when both Slack channels are specified AND the Slack MCP server is actually configured.

Three-Part Monitoring

Linear Triage Queue (Priority)
- Uses the Linear MCP server to query tickets labeled "firefighter" or "triage"
- Only considers tickets created or updated within the current time window
- Sorts by priority: Urgent > High > Medium > Low
- Extracts Bugsnag error IDs or Datadog monitor IDs from ticket descriptions
- Immediately investigates Urgent/High priority tickets
- Updates tickets with investigation progress and links PRs when fixes are ready
Bugsnag + Datadog Proactive Monitoring
- Queries Bugsnag for errors that appeared within the current time window
- Checks Datadog for monitors that transitioned to alert state since the last check
- Looks for log volume spikes and metric anomalies (error rates, latency)
- Cross-references against Linear to skip issues that already have tickets
- Creates new Linear tickets for newly discovered incidents
- Auto-investigates HIGH severity issues; alerts only for Medium/Low
Slack Channel Monitoring (when configured)
- Uses the Slack MCP server to read messages posted since the last check from configured channels (e.g., #datadog-alerts, #prod-incidents)
- Looks for Datadog bot messages containing monitor alerts, error spikes, or anomalies
- Parses alert details: monitor name, service, severity, Datadog links
- For each new alert: gathers context via Datadog MCP, spawns a sub-agent to investigate, and replies in the Slack thread with findings
- Tracks seen alerts by timestamp/thread ID to avoid duplicate investigations
- Only enabled when both Slack channels are specified and the Slack MCP server is installed

Sub-Agent Spawning

When the firefighter agent finds an issue worth investigating, it spawns a sub-agent using Claude's built-in Agent tool with isolation: "worktree". The sub-agent:

Works in an isolated git worktree (no interference with your working tree)
Analyzes root cause using Bugsnag/Datadog context
Implements a fix if the cause is clear
Runs the project's test suite
Creates a draft PR if tests pass
Reports back with findings

Multiple sub-agents can run in parallel for different issues.

Reactive Investigation

In addition to the polling loop, you can trigger investigation on demand:

Incident cards — Click "Investigate" on any incident card in the sidebar
Slack alerts — The InvestigateSlackAlert API lets you point the agent at a specific Slack thread and alert message

Reactive investigations follow the same workflow: gather context, analyze code, attempt fix, update ticket/thread.

Incident Canvas UI

Firefighter mode uses a dedicated incident dashboard instead of plain chat. The interface has two panels: an incident sidebar on the left and the chat panel on the right.

Incident Sidebar

The sidebar shows incident cards parsed from assistant messages. As the agent generates monitoring reports and investigation findings, the parser detects incidents and displays them as cards.

Each card shows:

Severity badge with colored left border (red = urgent, orange = high, yellow = medium, blue = low)
Source icon (flame for Linear, alert for Bugsnag, monitor for Datadog, message for Slack)
Status pill (New, Investigating, Fixing, Testing, Resolved, Failed)
Relative timestamp (e.g., "5m ago")
Linear ID if applicable
Message count — how many chat messages relate to this incident
Investigate button for new incidents

Cards are grouped into Active (top) and Resolved (collapsible at bottom). A header shows counts like "3 Active · 2 Resolved".

When no incidents have been detected, the sidebar shows an "All Clear — No incidents detected" empty state.

Incident-Scoped Chat

Clicking an incident card filters the chat to show only messages related to that incident. A context bar appears above the chat showing which incident you're viewing, with a "Show all" button to return to the full chat stream.

This lets you review the full investigation context for a single incident without scrolling through unrelated monitoring output.

Sidebar Controls

Collapse/expand — Click the chevron to hide or show the sidebar
Toggle incident — Click a selected card again to deselect and show all messages
Investigate — Click the button on a "New" incident to send an investigation prompt

Incident Detection

The parser recognizes incidents from several output patterns:

Pattern	Example
Emoji ticket alerts	`🎫 [Urgent] Linear ticket ENG-456: Payment NPE`
Emoji severity alerts	`🚨 NEW High: Error rate spike in checkout-service`
Incident Summary blocks	`### Incident Summary` with `Error:` and `Severity:` fields
Resolved status updates	`✅ Fixed: Payment service NPE`
Structured reports	Messages containing `Severity: MEDIUM-HIGH` + sections like "Root Cause", "Code Analysis", or "Customer Impact"

Severity values are normalized broadly: MEDIUM-HIGH maps to high, Critical and P1 map to urgent, Sev-3 maps to medium, etc.

When the same incident appears in multiple messages (same Linear ID or matching title), the parser merges them — accumulating related message IDs and escalating severity if a later report raises it.

Focus Area

The Focus Area field in the firefighter dialog lets you describe what you're on call for in natural language. Instead of just specifying a service name, you can describe your team's domain and the agent will discover relevant resources automatically.

Examples

"I'm on call for the API layer. Only care about api-gateway, order-service, and the backend team's Datadog dashboards."
"Frontend services only — web-app, search-service, notifications. Ignore backend stuff."
"Payment and billing services. Monitor any Datadog dashboards related to payments, Stripe webhooks, or invoicing."

How It Works

When a focus area is provided, the agent is instructed to:

Discover relevant resources — Search for Datadog dashboards, monitors, and services that match your description
Filter by scope — Skip alerts, errors, monitors, and tickets outside your focus area
Infer related services — If you describe a team or product area, the agent uses codebase knowledge and monitoring tools to discover which services belong to that area
Filter Linear tickets — Only investigate tickets with matching labels, components, or services
Filter Bugsnag errors — Only investigate errors from projects within scope
Repeat on every check — Each monitoring poll includes a scope reminder so the agent stays focused

This means the agent won't waste time investigating a frontend Datadog alert when you're only on call for backend services.

Chat Behavior

Smart Auto-Scroll

The chat does not force-scroll to the bottom when new messages arrive if you've scrolled up to read older content. This is particularly important during firefighter sessions where the agent generates frequent monitoring reports.

If you're at or near the bottom of the chat, new messages scroll into view automatically
If you've scrolled up, new messages arrive silently — a "New messages" pill button appears at the bottom of the chat
Click the pill to jump back to the latest messages
Sending a message also scrolls to the bottom

Setup

Firefighter mode relies on four MCP servers. Datadog is installed as a Claude Code plugin, and Linear and Slack are available as Boatman MCP presets (Settings > MCP Servers > Add from Presets).

Only Datadog is strictly required — the system prompt adapts to show only the tools you have configured.

Step 1: Datadog Plugin

The Datadog plugin provides MCP tools for querying logs, metrics, monitors, and traces. No API keys needed — it authenticates via OAuth.

claude /plugin install datadog

Then authenticate and discover your environment:

/mcp              # Select "datadog-mcp" → complete OAuth in browser
/dd-init           # Saves your dashboards and services to ~/.claude/CLAUDE.md

Step 2: Linear MCP Server

The Linear MCP server lets the agent query tickets, update status, add comments, and manage labels. The same LINEAR_API_KEY format used by tools like martabot (opens in a new tab) works here.

Get a Personal API key from Linear Settings > API (opens in a new tab)
Add the key in Boatman Settings > Firefighter > Linear API Key (used for Boatman Mode ticket execution too)
Enable the Linear MCP preset:

Option A — Boatman MCP Presets (recommended):

Go to Settings > MCP Servers > Add from Presets
Select "linear" and enter your LINEAR_API_KEY
Click Enable

Option B — Manual config (~/.claude/claude_mcp_config.json):

{
  "mcpServers": {
    "linear": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-linear"],
      "env": {
        "LINEAR_API_KEY": "lin_api_your-key-here"
      }
    }
  }
}

Step 3: Slack MCP Server (Optional)

The Slack MCP server lets the agent read channel messages, reply in threads, and search for alerts. The system prompt will only reference Slack tools when this server is configured — if you skip this step, everything else works without errors.

Get the Bot User OAuth Token (xoxb-...) from the existing Slack app:
- Go to api.slack.com/apps (opens in a new tab), select the app
- OAuth & Permissions > copy the Bot User OAuth Token
- The bot needs these scopes (should already be configured): channels:history, channels:read, chat:write
Enable the Slack MCP preset:

Option A — Boatman MCP Presets (recommended):

Go to Settings > MCP Servers > Add from Presets
Select "slack" and enter your SLACK_BOT_TOKEN and SLACK_TEAM_ID
Click Enable

Option B — Manual config (~/.claude/claude_mcp_config.json):

{
  "mcpServers": {
    "slack": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-slack"],
      "env": {
        "SLACK_BOT_TOKEN": "xoxb-your-bot-token-here",
        "SLACK_TEAM_ID": "T0123456789"
      }
    }
  }
}

Set default channels in Settings > Firefighter > Default Slack Alert Channels
- Enter comma-separated channel names, e.g., #datadog-alerts, #prod-incidents
- These pre-populate the Firefighter dialog so you don't have to type them each session
- You can also override channels per-session in the Firefighter dialog
Make sure the bot is invited to the channels you want to monitor: /invite @YourBotName

Step 4: (Optional) Bugsnag via Okta

If your organization uses Bugsnag behind Okta SSO:

Create an Okta OIDC Application:
- Okta Admin Dashboard > Applications > Create App Integration
- Choose "OIDC - OpenID Connect" > "Web Application"
- Redirect URI: http://localhost:8484/callback
- Note the Client ID and Client Secret
Configure in Boatman:
- Settings > Firefighter tab
- Enter Okta Domain, Client ID, and Client Secret
- Click "Save Changes"

Install the Bugsnag MCP server:

cd desktop/mcp-servers && make bugsnag-okta && make install

Add to MCP config (~/.claude/claude_mcp_config.json):

{
  "mcpServers": {
    "bugsnag-okta": {
      "command": "/Users/YOUR_USERNAME/.claude/mcp-servers/bugsnag-okta",
      "args": [],
      "env": {
        "OKTA_ACCESS_TOKEN": "[automatically-injected-by-boatman]"
      }
    }
  }
}

Using Firefighter Mode

Starting a Session

Click the Firefighter button (flame icon in the header)
(Optional) Describe your Focus Area — what you're on call for, which services/teams matter. The agent discovers relevant dashboards and monitors from your description.
(Optional) Enter Slack Alert Channels — comma-separated channels to monitor for Datadog alerts (e.g., #datadog-alerts, #prod-incidents). Pre-populated from your default channels in Settings.
Toggle Active Monitoring on/off — when on, the agent polls every 5 minutes
Click Start Investigation

What Happens Next

When monitoring is enabled, the agent immediately runs its first check and then repeats every 5 minutes:

Linear triage — Queries for firefighter/triage-labeled tickets and investigates high-priority ones
Bugsnag/Datadog — Checks for new errors and triggered monitors
Slack channels — Reads configured channels for Datadog alert bot messages (only if Slack MCP is configured)
Reports — Summarizes findings in the chat:
- 🎫 [Urgent] Linear ticket ENG-456: Payment refund NPE
- 🚨 NEW High: Error rate spike in checkout-service
- ✅ Linear queue: 2 tickets, Monitoring: No new issues

For Urgent/High issues, the agent automatically spawns a sub-agent to investigate and attempt a fix. For Medium/Low, it alerts you and waits for approval.

Interface Layout

Top Bar — Monitoring Status:

Green = Active monitoring (checking every 5 minutes)
Paused = Manual investigation only
Shows last check time and number of seen issues
Toggle monitoring on/off at any time

Incident Sidebar (left):

Shows incident cards parsed from monitoring output
Grouped by Active (top) and Resolved (collapsible)
Click a card to filter chat to that incident's messages
"Investigate" button to trigger investigation on new incidents
Collapse with the chevron to give more room to chat

Chat Panel (right):

Monitoring reports and investigation findings appear here
When an incident is selected, shows only that incident's messages with a "Show all" button
Smart auto-scroll — won't yank you to the bottom if you've scrolled up to read
"New messages" pill appears when new content arrives while scrolled up

Tasks Tab:

Shows ongoing investigations as tasks
Track status: investigating > fixing > testing > done/failed

Changes Tab:

View diffs from sub-agent fix attempts
Accept or reject proposed changes

Investigation Report Format

Each investigation generates a structured report:

### Incident Summary
- **Error**: NullPointerException in PaymentService.processRefund()
- **First Seen**: 2026-02-13 14:23:45 UTC
- **Frequency**: 47 occurrences in last hour
- **Severity**: High
 
### Affected Systems
- **Services**: payment-service, billing-service
- **Code Owners**: @payments-team
- **Endpoints**: POST /api/v1/payments/refund
 
### Timeline
- 14:20 UTC — Deployment: v2.3.4 to production
- 14:23 UTC — First error occurrence
- 14:25 UTC — Error rate spike to 15%
- 14:30 UTC — Datadog alert triggered
 
### Root Cause Analysis
Refund logic assumes transaction.customer is always present,
but can be null for guest checkouts introduced in v2.3.0.
 
### Recommended Actions
1. **Immediate**: Add null check before accessing transaction.customer
2. **Short-term**: Rollback to v2.3.3 if fix cannot deploy quickly
3. **Long-term**: Add integration tests for guest checkout flows
 
### Fix Attempted
- Created worktree: ../worktrees/fix-eng-456
- Applied fix: Added null guard in PaymentService.ts:142
- Tests passed: All 234 tests passing
- Draft PR created: #1234
- Linear ticket updated with PR link

Configuration Reference

All firefighter settings are in Settings > Firefighter:

Setting	Description	Required
Linear API Key	Personal API key (`lin_api_...`) for Linear ticket access	Yes
Default Slack Alert Channels	Comma-separated channels to monitor (e.g., `#datadog-alerts`)	No
Okta Domain	Your Okta domain (e.g., `your-org.okta.com`)	Only for Bugsnag
Okta Client ID	OIDC client ID for Bugsnag OAuth	Only for Bugsnag
Okta Client Secret	OIDC client secret	Only for Bugsnag

MCP servers:

Server	How to Enable	Purpose	Required
Datadog plugin	`claude /plugin install datadog`	Logs, metrics, monitors, traces	Yes
Linear	Boatman MCP Preset or `claude_mcp_config.json`	Ticket querying, status updates, comments	Yes
Slack	Boatman MCP Preset or `claude_mcp_config.json`	Channel monitoring, thread replies	No
Bugsnag-Okta	Manual build + `claude_mcp_config.json`	Error tracking via Okta SSO	No

Best Practices

Use the Focus Area

Describe your on-call scope in natural language: "API layer and backend services", "payment and billing services"
The agent discovers relevant Datadog dashboards, monitors, and services from your description
Alerts outside your scope are skipped entirely — no noise from other teams' services
Each monitoring check includes a scope reminder so the agent doesn't drift

Label Your Tickets

Use firefighter or triage labels in Linear so the agent picks them up
Add service tags: payment, auth, billing
Include Bugsnag error URLs or Datadog monitor links in ticket descriptions — the agent extracts these automatically

Choose Slack Channels Carefully

Point at channels that receive Datadog bot alerts (not general discussion channels)
The agent filters for Datadog alert patterns — non-alert messages are ignored
Good candidates: #datadog-alerts, #prod-incidents, #on-call-alerts
Make sure the Slack bot is invited to the channel (/invite @YourBotName)

Priority Triage

The agent prioritizes Linear tickets over proactive monitoring over Slack
Urgent/High from Linear: investigated immediately
HIGH severity from Bugsnag/Datadog: investigated immediately
Slack alerts with Datadog context: investigated via sub-agent
Medium/Low: alert only, waits for your approval

Sub-Agent Worktrees

Sub-agents create isolated worktrees under ../worktrees/fix-<issue-id>
Multiple investigations can run in parallel without conflicting
Clean up old worktrees periodically: git worktree remove ../worktrees/fix-old-issue
Worktrees are separate from your working directory — your uncommitted changes are safe

BoatmanMode Integration MCP Servers