Monitoring vs Remediation
Monitoring detects problems. mttrly acts after the alert.
Grafana, Datadog, Sentry, PagerDuty, and Prometheus help teams see metrics, traces, logs, errors, alerts, and historical context. mttrly is the incident response action layer that starts after those tools raise the signal.
Direct answer
Monitoring detects; mttrly acts after the alert
Use Grafana, Datadog, Sentry, PagerDuty, Prometheus, or your existing observability stack to detect and explain incidents. Use mttrly after the alert to inspect live server reality, run scoped diagnostics, choose prepared playbooks, request human approval for risky remediation, verify the result, and keep an audit trail.
What belongs in monitoring, and what belongs in mttrly
| Capability | Monitoring tools | mttrly |
|---|---|---|
| Primary job | Detect, visualize, correlate, and route incident signals. | Diagnose and coordinate approved response after an alert. |
| Best signals | Metrics, dashboards, traces, logs, errors, alerts, and historical context. | Server reality, health state, diagnostics, playbooks, pending approvals, verification, and audit trail. |
| Typical question | "What is unhealthy, when did it start, and who needs to know?" | "What can we safely inspect or remediate on this server next?" |
| Action model | Usually read, alert, route, annotate, or open an incident workflow. | Inspect, diagnose, choose a playbook, request approval, execute approved actions, and verify. |
| Risk control | Handled by the team response process around the monitoring tool. | Risky actions require human approval; the AI cannot approve its own risky action. |
| Command execution | Not the main purpose of observability tools. | Available only when enabled, scoped, approval-gated, and audited. Playbooks are preferred. |
| Primary purpose | Incident response | Observability |
| Take action | Yes (restart, deploy) | No (alert only) |
| Setup complexity | 2 minutes | Hours to days |
| Cost | Free tier, $39/mo Bro, $99/mo Crew | $50-500+/month |
| Mobile app | Telegram (already have) | Separate app needed |
| Mobile action | Full control | View-only |
Post-alert workflow
The response path is intentionally human-approved when state can change.
- 01
Alert fires
Grafana, Datadog, Sentry, PagerDuty, or Prometheus signals a problem through the existing incident channel.
- 02
Responder investigates with mttrly
A human responder or AI assistant uses mttrly to look at the affected server after the alert, not instead of the monitoring tool.
- 03
Scoped diagnostics run first
mttrly reads current server health, service reality, alerts, logs, and targeted diagnostics before proposing a change.
- 04
Playbook or action is requested
Prepared playbooks are preferred. Scoped command execution can be enabled for narrower cases, but it is treated as a controlled action path.
- 05
Human approval gates risky remediation
Risky actions create pending approvals. The AI can request an action, but it must not approve its own risky action.
- 06
Verification and audit close the loop
mttrly verifies what it can, records diagnostics, approval decisions, execution results, and leaves monitoring tools to confirm the system trend.
Two different jobs
Monitoring tools detect and explain signals
- +Metrics and dashboards for service and infrastructure state
- +Traces, logs, and application errors for root-cause context
- +Alerts, routing, escalation, and historical timelines
- +Trends, baselines, regressions, and capacity context
- +Shared observability context for the incident team
They answer: "What is happening, where is it happening, and how did it change over time?"
mttrly investigates and acts after the alert
- +Server reality checks for the affected host or service
- +Post-alert diagnostics that gather current operating context
- +Prepared remediation playbooks before free-form commands
- +Approval-gated action requests for risky changes
- +Verification steps and an audit trail for incident review
It answers: "What can we safely inspect, request, approve, and verify next?"
Use monitoring for visibility. Add mttrly for the controlled action layer after the signal.
Where familiar monitoring tools fit
Grafana
Dashboards, metric exploration, and alert context
Grafana remains the place to see system behavior over time. mttrly is not a Grafana alternative; it is the action layer used after a Grafana alert or dashboard investigation points to a server that needs attention.
Prometheus
Metrics collection, alert rules, and time-series context
Prometheus is excellent for measuring resource pressure and service signals. mttrly can use the alert as the starting point for live server diagnostics and approval-gated remediation.
Datadog
APM, infrastructure telemetry, logs, monitors, and alerts
Datadog helps teams correlate infrastructure and application behavior. mttrly complements that by turning a confirmed alert into a controlled investigate, approve, act, and verify workflow.
Sentry
Application errors, exceptions, releases, and issue context
Sentry explains application failures and affected code paths. mttrly helps responders inspect the server, choose a bounded operational response, and audit what happened after the error signal.
PagerDuty
Alert routing, escalation, and responder coordination
PagerDuty brings the right human into the loop. mttrly gives that responder a scoped action surface with diagnostics, approvals, playbooks, and audit history.
Action layer safety model
mttrly is designed for controlled response, not unattended risky remediation.
Read first
AI can inspect server status, alerts, logs, service reality, and diagnostics before recommending action.
Playbooks preferred
Known remediation paths should use prepared playbooks instead of ad hoc shell commands.
Human approval
Risky actions require explicit human approval. AI can request approval, but cannot approve its own risky action.
Scoped commands
Command execution, when enabled, is scoped, approval-gated, and recorded in the audit trail.
Verify and audit
The response should end with verification and a reviewable trail of diagnostics, approvals, and execution results.
Next places to go
FAQ
Is mttrly a Grafana alternative?
No. mttrly is not a Grafana alternative. Keep Grafana for dashboards, metrics, and alert context; use mttrly after the alert for server diagnostics, approval-gated remediation, verification, and audit trail.
Does mttrly replace Datadog, Sentry, PagerDuty, or Prometheus?
No. Those tools detect, explain, route, and contextualize incidents. mttrly complements them as the incident response action layer after the alert.
Can AI execute commands through mttrly?
Only when command execution is enabled, scoped, approval-gated, and audited. Prepared playbooks are preferred, and risky actions require human approval. The AI must not approve its own risky action.
What happens after a monitoring alert fires?
A responder can use mttrly to inspect live server reality, run focused diagnostics, choose a playbook or request an action, get human approval for risky remediation, verify the result, and preserve an audit trail.
mttrly is not a Grafana alternative, and it does not replace Datadog or Sentry. It complements the monitoring stack: detection stays in observability tools, while post-alert diagnostics, approval-gated remediation, verification, and audit trail live in mttrly.