Monitoring vs Remediation

Monitoring detects problems. mttrly acts after the alert.

Grafana, Datadog, Sentry, PagerDuty, and Prometheus help teams see metrics, traces, logs, errors, alerts, and historical context. mttrly is the incident response action layer that starts after those tools raise the signal.

Direct answer

Monitoring detects; mttrly acts after the alert

Use Grafana, Datadog, Sentry, PagerDuty, Prometheus, or your existing observability stack to detect and explain incidents. Use mttrly after the alert to inspect live server reality, run scoped diagnostics, choose prepared playbooks, request human approval for risky remediation, verify the result, and keep an audit trail.

What belongs in monitoring, and what belongs in mttrly

CapabilityMonitoring toolsmttrly
Primary jobDetect, visualize, correlate, and route incident signals.Diagnose and coordinate approved response after an alert.
Best signalsMetrics, dashboards, traces, logs, errors, alerts, and historical context.Server reality, health state, diagnostics, playbooks, pending approvals, verification, and audit trail.
Typical question"What is unhealthy, when did it start, and who needs to know?""What can we safely inspect or remediate on this server next?"
Action modelUsually read, alert, route, annotate, or open an incident workflow.Inspect, diagnose, choose a playbook, request approval, execute approved actions, and verify.
Risk controlHandled by the team response process around the monitoring tool.Risky actions require human approval; the AI cannot approve its own risky action.
Command executionNot the main purpose of observability tools.Available only when enabled, scoped, approval-gated, and audited. Playbooks are preferred.
Primary purposeIncident responseObservability
Take actionYes (restart, deploy)No (alert only)
Setup complexity2 minutesHours to days
CostFree tier, $39/mo Bro, $99/mo Crew$50-500+/month
Mobile appTelegram (already have)Separate app needed
Mobile actionFull controlView-only

Post-alert workflow

The response path is intentionally human-approved when state can change.

  1. 01

    Alert fires

    Grafana, Datadog, Sentry, PagerDuty, or Prometheus signals a problem through the existing incident channel.

  2. 02

    Responder investigates with mttrly

    A human responder or AI assistant uses mttrly to look at the affected server after the alert, not instead of the monitoring tool.

  3. 03

    Scoped diagnostics run first

    mttrly reads current server health, service reality, alerts, logs, and targeted diagnostics before proposing a change.

  4. 04

    Playbook or action is requested

    Prepared playbooks are preferred. Scoped command execution can be enabled for narrower cases, but it is treated as a controlled action path.

  5. 05

    Human approval gates risky remediation

    Risky actions create pending approvals. The AI can request an action, but it must not approve its own risky action.

  6. 06

    Verification and audit close the loop

    mttrly verifies what it can, records diagnostics, approval decisions, execution results, and leaves monitoring tools to confirm the system trend.

Two different jobs

Monitoring tools detect and explain signals

  • +Metrics and dashboards for service and infrastructure state
  • +Traces, logs, and application errors for root-cause context
  • +Alerts, routing, escalation, and historical timelines
  • +Trends, baselines, regressions, and capacity context
  • +Shared observability context for the incident team

They answer: "What is happening, where is it happening, and how did it change over time?"

mttrly investigates and acts after the alert

  • +Server reality checks for the affected host or service
  • +Post-alert diagnostics that gather current operating context
  • +Prepared remediation playbooks before free-form commands
  • +Approval-gated action requests for risky changes
  • +Verification steps and an audit trail for incident review

It answers: "What can we safely inspect, request, approve, and verify next?"

Use monitoring for visibility. Add mttrly for the controlled action layer after the signal.

Where familiar monitoring tools fit

Grafana

Dashboards, metric exploration, and alert context

Grafana remains the place to see system behavior over time. mttrly is not a Grafana alternative; it is the action layer used after a Grafana alert or dashboard investigation points to a server that needs attention.

Prometheus

Metrics collection, alert rules, and time-series context

Prometheus is excellent for measuring resource pressure and service signals. mttrly can use the alert as the starting point for live server diagnostics and approval-gated remediation.

Datadog

APM, infrastructure telemetry, logs, monitors, and alerts

Datadog helps teams correlate infrastructure and application behavior. mttrly complements that by turning a confirmed alert into a controlled investigate, approve, act, and verify workflow.

Sentry

Application errors, exceptions, releases, and issue context

Sentry explains application failures and affected code paths. mttrly helps responders inspect the server, choose a bounded operational response, and audit what happened after the error signal.

PagerDuty

Alert routing, escalation, and responder coordination

PagerDuty brings the right human into the loop. mttrly gives that responder a scoped action surface with diagnostics, approvals, playbooks, and audit history.

Action layer safety model

mttrly is designed for controlled response, not unattended risky remediation.

Read first

AI can inspect server status, alerts, logs, service reality, and diagnostics before recommending action.

Playbooks preferred

Known remediation paths should use prepared playbooks instead of ad hoc shell commands.

Human approval

Risky actions require explicit human approval. AI can request approval, but cannot approve its own risky action.

Scoped commands

Command execution, when enabled, is scoped, approval-gated, and recorded in the audit trail.

Verify and audit

The response should end with verification and a reviewable trail of diagnostics, approvals, and execution results.

Next places to go

FAQ

Is mttrly a Grafana alternative?

No. mttrly is not a Grafana alternative. Keep Grafana for dashboards, metrics, and alert context; use mttrly after the alert for server diagnostics, approval-gated remediation, verification, and audit trail.

Does mttrly replace Datadog, Sentry, PagerDuty, or Prometheus?

No. Those tools detect, explain, route, and contextualize incidents. mttrly complements them as the incident response action layer after the alert.

Can AI execute commands through mttrly?

Only when command execution is enabled, scoped, approval-gated, and audited. Prepared playbooks are preferred, and risky actions require human approval. The AI must not approve its own risky action.

What happens after a monitoring alert fires?

A responder can use mttrly to inspect live server reality, run focused diagnostics, choose a playbook or request an action, get human approval for risky remediation, verify the result, and preserve an audit trail.

mttrly is not a Grafana alternative, and it does not replace Datadog or Sentry. It complements the monitoring stack: detection stays in observability tools, while post-alert diagnostics, approval-gated remediation, verification, and audit trail live in mttrly.