Firefighter Mode
Firefighter Mode is a specialized agent for investigating and fixing production incidents. It combines Linear ticket triage, Slack channel monitoring, Bugsnag error tracking, and Datadog observability into a single automated loop — designed for on-call engineers who want an AI copilot watching production alongside them.
How It Works
┌─────────────────┐
│ Polling Loop │
│ (every 5 min) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
Linear Triage Slack Channels Bugsnag/Datadog
Queue (Part 1) (Part 3) (Part 2)
│ │ │
└──────────────┼──────────────┘
▼
Extract Context
(Datadog MCP tools)
│
▼
Spawn Sub-Agent
(isolated worktree)
│
┌─────────┴─────────┐
▼ ▼
Attempt Fix Generate Report
Run Tests Update Linear
Create PR Reply in SlackThe agent runs a 5-minute polling loop that checks three sources for new issues, then investigates and attempts fixes automatically.
Time Window
The agent doesn't scan all historical data. It uses a bounded time window:
- First check — looks back at most 24 hours before session start to catch any recent/active issues
- Subsequent checks — only looks at data since the last check (roughly 5 minutes), so each poll is fast and focused on what's new
This means you can start a session, get a quick catch-up on anything actively broken, and then the agent shifts to watching for new problems in real time.
Dynamic MCP Tool Discovery
The system prompt automatically adapts to which MCP servers you have configured. If you don't have the Slack MCP server installed, the agent won't reference Slack tools or try to use them. Only configured tools appear in the prompt — no more "Slack MCP is unavailable" errors.
The monitoring prompt also respects this: Slack channel monitoring instructions are only included when both Slack channels are specified AND the Slack MCP server is actually configured.
Three-Part Monitoring
-
Linear Triage Queue (Priority)
- Uses the Linear MCP server to query tickets labeled "firefighter" or "triage"
- Only considers tickets created or updated within the current time window
- Sorts by priority: Urgent > High > Medium > Low
- Extracts Bugsnag error IDs or Datadog monitor IDs from ticket descriptions
- Immediately investigates Urgent/High priority tickets
- Updates tickets with investigation progress and links PRs when fixes are ready
-
Bugsnag + Datadog Proactive Monitoring
- Queries Bugsnag for errors that appeared within the current time window
- Checks Datadog for monitors that transitioned to alert state since the last check
- Looks for log volume spikes and metric anomalies (error rates, latency)
- Cross-references against Linear to skip issues that already have tickets
- Creates new Linear tickets for newly discovered incidents
- Auto-investigates HIGH severity issues; alerts only for Medium/Low
-
Slack Channel Monitoring (when configured)
- Uses the Slack MCP server to read messages posted since the last check from configured channels (e.g.,
#datadog-alerts,#prod-incidents) - Looks for Datadog bot messages containing monitor alerts, error spikes, or anomalies
- Parses alert details: monitor name, service, severity, Datadog links
- For each new alert: gathers context via Datadog MCP, spawns a sub-agent to investigate, and replies in the Slack thread with findings
- Tracks seen alerts by timestamp/thread ID to avoid duplicate investigations
- Only enabled when both Slack channels are specified and the Slack MCP server is installed
- Uses the Slack MCP server to read messages posted since the last check from configured channels (e.g.,
Sub-Agent Spawning
When the firefighter agent finds an issue worth investigating, it spawns a sub-agent using Claude's built-in Agent tool with isolation: "worktree". The sub-agent:
- Works in an isolated git worktree (no interference with your working tree)
- Analyzes root cause using Bugsnag/Datadog context
- Implements a fix if the cause is clear
- Runs the project's test suite
- Creates a draft PR if tests pass
- Reports back with findings
Multiple sub-agents can run in parallel for different issues.
Reactive Investigation
In addition to the polling loop, you can trigger investigation on demand:
- Incident cards — Click "Investigate" on any incident card in the sidebar
- Slack alerts — The
InvestigateSlackAlertAPI lets you point the agent at a specific Slack thread and alert message
Reactive investigations follow the same workflow: gather context, analyze code, attempt fix, update ticket/thread.
Incident Canvas UI
Firefighter mode uses a dedicated incident dashboard instead of plain chat. The interface has two panels: an incident sidebar on the left and the chat panel on the right.
Incident Sidebar
The sidebar shows incident cards parsed from assistant messages. As the agent generates monitoring reports and investigation findings, the parser detects incidents and displays them as cards.
Each card shows:
- Severity badge with colored left border (red = urgent, orange = high, yellow = medium, blue = low)
- Source icon (flame for Linear, alert for Bugsnag, monitor for Datadog, message for Slack)
- Status pill (New, Investigating, Fixing, Testing, Resolved, Failed)
- Relative timestamp (e.g., "5m ago")
- Linear ID if applicable
- Message count — how many chat messages relate to this incident
- Investigate button for new incidents
Cards are grouped into Active (top) and Resolved (collapsible at bottom). A header shows counts like "3 Active · 2 Resolved".
When no incidents have been detected, the sidebar shows an "All Clear — No incidents detected" empty state.
Incident-Scoped Chat
Clicking an incident card filters the chat to show only messages related to that incident. A context bar appears above the chat showing which incident you're viewing, with a "Show all" button to return to the full chat stream.
This lets you review the full investigation context for a single incident without scrolling through unrelated monitoring output.
Sidebar Controls
- Collapse/expand — Click the chevron to hide or show the sidebar
- Toggle incident — Click a selected card again to deselect and show all messages
- Investigate — Click the button on a "New" incident to send an investigation prompt
Incident Detection
The parser recognizes incidents from several output patterns:
| Pattern | Example |
|---|---|
| Emoji ticket alerts | 🎫 [Urgent] Linear ticket ENG-456: Payment NPE |
| Emoji severity alerts | 🚨 NEW High: Error rate spike in checkout-service |
| Incident Summary blocks | ### Incident Summary with Error: and Severity: fields |
| Resolved status updates | ✅ Fixed: Payment service NPE |
| Structured reports | Messages containing Severity: MEDIUM-HIGH + sections like "Root Cause", "Code Analysis", or "Customer Impact" |
Severity values are normalized broadly: MEDIUM-HIGH maps to high, Critical and P1 map to urgent, Sev-3 maps to medium, etc.
When the same incident appears in multiple messages (same Linear ID or matching title), the parser merges them — accumulating related message IDs and escalating severity if a later report raises it.
Focus Area
The Focus Area field in the firefighter dialog lets you describe what you're on call for in natural language. Instead of just specifying a service name, you can describe your team's domain and the agent will discover relevant resources automatically.
Examples
"I'm on call for the API layer. Only care about api-gateway, order-service, and the backend team's Datadog dashboards.""Frontend services only — web-app, search-service, notifications. Ignore backend stuff.""Payment and billing services. Monitor any Datadog dashboards related to payments, Stripe webhooks, or invoicing."
How It Works
When a focus area is provided, the agent is instructed to:
- Discover relevant resources — Search for Datadog dashboards, monitors, and services that match your description
- Filter by scope — Skip alerts, errors, monitors, and tickets outside your focus area
- Infer related services — If you describe a team or product area, the agent uses codebase knowledge and monitoring tools to discover which services belong to that area
- Filter Linear tickets — Only investigate tickets with matching labels, components, or services
- Filter Bugsnag errors — Only investigate errors from projects within scope
- Repeat on every check — Each monitoring poll includes a scope reminder so the agent stays focused
This means the agent won't waste time investigating a frontend Datadog alert when you're only on call for backend services.
Chat Behavior
Smart Auto-Scroll
The chat does not force-scroll to the bottom when new messages arrive if you've scrolled up to read older content. This is particularly important during firefighter sessions where the agent generates frequent monitoring reports.
- If you're at or near the bottom of the chat, new messages scroll into view automatically
- If you've scrolled up, new messages arrive silently — a "New messages" pill button appears at the bottom of the chat
- Click the pill to jump back to the latest messages
- Sending a message also scrolls to the bottom
Setup
Firefighter mode relies on four MCP servers. Datadog is installed as a Claude Code plugin, and Linear and Slack are available as Boatman MCP presets (Settings > MCP Servers > Add from Presets).
Only Datadog is strictly required — the system prompt adapts to show only the tools you have configured.
Step 1: Datadog Plugin
The Datadog plugin provides MCP tools for querying logs, metrics, monitors, and traces. No API keys needed — it authenticates via OAuth.
claude /plugin install datadogThen authenticate and discover your environment:
/mcp # Select "datadog-mcp" → complete OAuth in browser
/dd-init # Saves your dashboards and services to ~/.claude/CLAUDE.mdStep 2: Linear MCP Server
The Linear MCP server lets the agent query tickets, update status, add comments, and manage labels. The same LINEAR_API_KEY format used by tools like martabot (opens in a new tab) works here.
- Get a Personal API key from Linear Settings > API (opens in a new tab)
- Add the key in Boatman Settings > Firefighter > Linear API Key (used for Boatman Mode ticket execution too)
- Enable the Linear MCP preset:
Option A — Boatman MCP Presets (recommended):
- Go to Settings > MCP Servers > Add from Presets
- Select "linear" and enter your
LINEAR_API_KEY - Click Enable
Option B — Manual config (~/.claude/claude_mcp_config.json):
{
"mcpServers": {
"linear": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-linear"],
"env": {
"LINEAR_API_KEY": "lin_api_your-key-here"
}
}
}
}Step 3: Slack MCP Server (Optional)
The Slack MCP server lets the agent read channel messages, reply in threads, and search for alerts. The system prompt will only reference Slack tools when this server is configured — if you skip this step, everything else works without errors.
-
Get the Bot User OAuth Token (
xoxb-...) from the existing Slack app:- Go to api.slack.com/apps (opens in a new tab), select the app
- OAuth & Permissions > copy the Bot User OAuth Token
- The bot needs these scopes (should already be configured):
channels:history,channels:read,chat:write
-
Enable the Slack MCP preset:
Option A — Boatman MCP Presets (recommended):
- Go to Settings > MCP Servers > Add from Presets
- Select "slack" and enter your
SLACK_BOT_TOKENandSLACK_TEAM_ID - Click Enable
Option B — Manual config (~/.claude/claude_mcp_config.json):
{
"mcpServers": {
"slack": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-slack"],
"env": {
"SLACK_BOT_TOKEN": "xoxb-your-bot-token-here",
"SLACK_TEAM_ID": "T0123456789"
}
}
}
}-
Set default channels in Settings > Firefighter > Default Slack Alert Channels
- Enter comma-separated channel names, e.g.,
#datadog-alerts, #prod-incidents - These pre-populate the Firefighter dialog so you don't have to type them each session
- You can also override channels per-session in the Firefighter dialog
- Enter comma-separated channel names, e.g.,
-
Make sure the bot is invited to the channels you want to monitor:
/invite @YourBotName
Step 4: (Optional) Bugsnag via Okta
If your organization uses Bugsnag behind Okta SSO:
-
Create an Okta OIDC Application:
- Okta Admin Dashboard > Applications > Create App Integration
- Choose "OIDC - OpenID Connect" > "Web Application"
- Redirect URI:
http://localhost:8484/callback - Note the Client ID and Client Secret
-
Configure in Boatman:
- Settings > Firefighter tab
- Enter Okta Domain, Client ID, and Client Secret
- Click "Save Changes"
-
Install the Bugsnag MCP server:
cd desktop/mcp-servers && make bugsnag-okta && make install -
Add to MCP config (
~/.claude/claude_mcp_config.json):{ "mcpServers": { "bugsnag-okta": { "command": "/Users/YOUR_USERNAME/.claude/mcp-servers/bugsnag-okta", "args": [], "env": { "OKTA_ACCESS_TOKEN": "[automatically-injected-by-boatman]" } } } }
Using Firefighter Mode
Starting a Session
- Click the Firefighter button (flame icon in the header)
- (Optional) Describe your Focus Area — what you're on call for, which services/teams matter. The agent discovers relevant dashboards and monitors from your description.
- (Optional) Enter Slack Alert Channels — comma-separated channels to monitor for Datadog alerts (e.g.,
#datadog-alerts, #prod-incidents). Pre-populated from your default channels in Settings. - Toggle Active Monitoring on/off — when on, the agent polls every 5 minutes
- Click Start Investigation
What Happens Next
When monitoring is enabled, the agent immediately runs its first check and then repeats every 5 minutes:
- Linear triage — Queries for firefighter/triage-labeled tickets and investigates high-priority ones
- Bugsnag/Datadog — Checks for new errors and triggered monitors
- Slack channels — Reads configured channels for Datadog alert bot messages (only if Slack MCP is configured)
- Reports — Summarizes findings in the chat:
🎫 [Urgent] Linear ticket ENG-456: Payment refund NPE🚨 NEW High: Error rate spike in checkout-service✅ Linear queue: 2 tickets, Monitoring: No new issues
For Urgent/High issues, the agent automatically spawns a sub-agent to investigate and attempt a fix. For Medium/Low, it alerts you and waits for approval.
Interface Layout
Top Bar — Monitoring Status:
- Green = Active monitoring (checking every 5 minutes)
- Paused = Manual investigation only
- Shows last check time and number of seen issues
- Toggle monitoring on/off at any time
Incident Sidebar (left):
- Shows incident cards parsed from monitoring output
- Grouped by Active (top) and Resolved (collapsible)
- Click a card to filter chat to that incident's messages
- "Investigate" button to trigger investigation on new incidents
- Collapse with the chevron to give more room to chat
Chat Panel (right):
- Monitoring reports and investigation findings appear here
- When an incident is selected, shows only that incident's messages with a "Show all" button
- Smart auto-scroll — won't yank you to the bottom if you've scrolled up to read
- "New messages" pill appears when new content arrives while scrolled up
Tasks Tab:
- Shows ongoing investigations as tasks
- Track status: investigating > fixing > testing > done/failed
Changes Tab:
- View diffs from sub-agent fix attempts
- Accept or reject proposed changes
Investigation Report Format
Each investigation generates a structured report:
### Incident Summary
- **Error**: NullPointerException in PaymentService.processRefund()
- **First Seen**: 2026-02-13 14:23:45 UTC
- **Frequency**: 47 occurrences in last hour
- **Severity**: High
### Affected Systems
- **Services**: payment-service, billing-service
- **Code Owners**: @payments-team
- **Endpoints**: POST /api/v1/payments/refund
### Timeline
- 14:20 UTC — Deployment: v2.3.4 to production
- 14:23 UTC — First error occurrence
- 14:25 UTC — Error rate spike to 15%
- 14:30 UTC — Datadog alert triggered
### Root Cause Analysis
Refund logic assumes transaction.customer is always present,
but can be null for guest checkouts introduced in v2.3.0.
### Recommended Actions
1. **Immediate**: Add null check before accessing transaction.customer
2. **Short-term**: Rollback to v2.3.3 if fix cannot deploy quickly
3. **Long-term**: Add integration tests for guest checkout flows
### Fix Attempted
- Created worktree: ../worktrees/fix-eng-456
- Applied fix: Added null guard in PaymentService.ts:142
- Tests passed: All 234 tests passing
- Draft PR created: #1234
- Linear ticket updated with PR linkConfiguration Reference
All firefighter settings are in Settings > Firefighter:
| Setting | Description | Required |
|---|---|---|
| Linear API Key | Personal API key (lin_api_...) for Linear ticket access | Yes |
| Default Slack Alert Channels | Comma-separated channels to monitor (e.g., #datadog-alerts) | No |
| Okta Domain | Your Okta domain (e.g., your-org.okta.com) | Only for Bugsnag |
| Okta Client ID | OIDC client ID for Bugsnag OAuth | Only for Bugsnag |
| Okta Client Secret | OIDC client secret | Only for Bugsnag |
MCP servers:
| Server | How to Enable | Purpose | Required |
|---|---|---|---|
| Datadog plugin | claude /plugin install datadog | Logs, metrics, monitors, traces | Yes |
| Linear | Boatman MCP Preset or claude_mcp_config.json | Ticket querying, status updates, comments | Yes |
| Slack | Boatman MCP Preset or claude_mcp_config.json | Channel monitoring, thread replies | No |
| Bugsnag-Okta | Manual build + claude_mcp_config.json | Error tracking via Okta SSO | No |
Best Practices
Use the Focus Area
- Describe your on-call scope in natural language:
"API layer and backend services","payment and billing services" - The agent discovers relevant Datadog dashboards, monitors, and services from your description
- Alerts outside your scope are skipped entirely — no noise from other teams' services
- Each monitoring check includes a scope reminder so the agent doesn't drift
Label Your Tickets
- Use
firefighterortriagelabels in Linear so the agent picks them up - Add service tags:
payment,auth,billing - Include Bugsnag error URLs or Datadog monitor links in ticket descriptions — the agent extracts these automatically
Choose Slack Channels Carefully
- Point at channels that receive Datadog bot alerts (not general discussion channels)
- The agent filters for Datadog alert patterns — non-alert messages are ignored
- Good candidates:
#datadog-alerts,#prod-incidents,#on-call-alerts - Make sure the Slack bot is invited to the channel (
/invite @YourBotName)
Priority Triage
- The agent prioritizes Linear tickets over proactive monitoring over Slack
- Urgent/High from Linear: investigated immediately
- HIGH severity from Bugsnag/Datadog: investigated immediately
- Slack alerts with Datadog context: investigated via sub-agent
- Medium/Low: alert only, waits for your approval
Sub-Agent Worktrees
- Sub-agents create isolated worktrees under
../worktrees/fix-<issue-id> - Multiple investigations can run in parallel without conflicting
- Clean up old worktrees periodically:
git worktree remove ../worktrees/fix-old-issue - Worktrees are separate from your working directory — your uncommitted changes are safe