Guarded incident response for live servers

Your server broke.
mttrly shows why._

/ mät·ter·ly /

It watches your VPS, gathers evidence, and proposes the next fix. Risky actions wait for your approval; every step lands in the audit log.

Powered by multi-step AI reasoning - not generic chatbot advice.

Start Watchdog free

Outside check needs no signup. Connect an agent when you're ready to see inside the server.

Free watchdog tier • AI features from $39/mo
>_ MTTRLY // INCIDENT
api.myapp.com is throwing 502s
Checked nginx, the app process, disk, and recent deploy markers. The API process was OOM-killed 8 min ago; RAM pressure is 94%. Options: 1. Show failing logs 2. Restart the API process 3. Keep watching and alert again Approve restart?
Yes
Restarted. Latency is back to normal. Watching for 5 min to confirm. Nothing else touched - full action is in your audit log.

Outside check

Start without signup

Check DNS, TLS, reachability, and public symptoms before you open SSH.

Evidence

See what changed

When the agent is connected, mttrly reads live server facts instead of guessing from a prompt.

Approval

You choose the fix

Risky restarts, command actions, and deploy work wait for explicit approval.

Audit

Every step recorded

Requests, approvals, actions, and results stay visible after the incident is over.

Sound familiar?

01_PANIC

"It worked on my machine"

You shipped a change and production went blank. SSH opens to a wall of logs, the useful line is buried, and users are already noticing.

02_LOST

"What does this error even mean?"

ChatGPT can explain the error string. It cannot see your nginx status, process memory, disk pressure, or the exact service that is down.

03_STUCK

"One wrong command and it's over"

You probably need to restart something. But which process? What else changes? mttrly keeps the next action bounded and waits for your call.

What mttrly actually does

It turns a live-server incident into a controlled loop: watch, diagnose, approve, verify.

  • Sees your actual server state, not just a pasted log line
  • Explains the likely cause in plain English
  • Keeps risky actions behind approval and audit

The incident loop:

1. Watches & catches

Watchdog checks public and connected-server signals, then routes the symptom to the dashboard, Telegram, or MCP.

2. Diagnoses with evidence

mttrly checks processes, logs, ports, disk, memory, and recent change markers before it explains the likely root cause.

3. Fixes under approval

It proposes bounded next steps. Restarts, command actions, and deploy work wait for your explicit approval and leave an audit trail.

You stay in control. mttrly does the legwork.

How the AI works

Not a chatbot. A reasoning loop that checks real server state and keeps risky actions gated.

01
ALERTAnomaly detected

Watchdog or an outside check catches the symptom and routes it to your workspace.

02
TRIAGEClassify in <1s

A fast model classifies the incident type and chooses the next diagnostic recipe.

03
DIAGNOSEEvidence loop

Scoped tools read logs, services, ports, disk, memory, and recent change markers.

04
ROOT CAUSEPattern identified

The model correlates the facts, explains the likely cause, and scores confidence.

05
PLANBounded options

mttrly lists next steps with risk levels instead of jumping straight to a terminal command.

06
YOUR CALLHuman approval

Sensitive actions wait for approval from the dashboard, Telegram, or an MCP/IDE flow.

07
EXECUTEConfirmed fix

The agent runs only the approved action, then reads the new state to verify recovery.

08
SUMMARYIncident report

You get what happened, what changed, and what to prevent next time.

Multi-step reasoning per incident · scoped server tools · approvals and audit log

Old way vs with mttrly

"Is my app running?"

Old way:

SSH → systemctl status → docker ps → check processes Takes 5 minutes. Need to know what to check.

With mttrly:

You ask once. mttrly checks the app process, nginx, database connections, and resource usage, then returns the status in plain English.

"Why is it slow?"

Old way:

Check htop, tail logs, compare recent changes, and hope you remember which service normally uses this much RAM.

With mttrly:

It checks RAM, disk, CPU, network, logs, and recent deploy markers, then gives you the likely bottleneck plus clear options.

"Show me the errors"

Old way:

journalctl -u app -n 1000 | grep ERROR Then scroll through hundreds of lines and guess which ones matter.

With mttrly:

It groups repeated errors, points at the first new pattern, correlates it with recent changes, and suggests the safest next step.

"Can I deploy safely?"

Old way:

Run the deploy checklist by hand: pull, install, migrate, build, restart, verify. Miss one step and production pays.

With mttrly:

Deploy profiles can run pre-checks, guarded steps, post-checks, and rollback paths for supported modes. You still approve the risky parts.

Real scenarios

See how diagnostic recipes work in action

Scenario 1: "My site is down"

My site is down
Running diagnostics...

✓ Server is reachable (ping 34ms)
✓ Port 80 open
✓ Port 443 open
✗ Nginx not responding

Checking nginx status...
✗ Nginx process not running

Reading nginx error log...
Found: "nginx: [emerg] unexpected '}' in /etc/nginx/sites-enabled/default:47"

Issue found: Nginx config has syntax error at line 47.

Fix the typo, then I can reload nginx for you.
Or: want me to restore yesterday's working config?

30 seconds from "site is down" to knowing exactly what to fix.

Scenario 2: "Everything is slow"

Why is everything so slow?
Let me check...

CPU: 23% (normal)
Disk: 45% used (normal)
RAM: 94% used ⚠️

Found the issue: node.js process consuming 3.2GB RAM

Checking logs for memory-related errors...
Found: 127 'JavaScript heap out of memory' warnings
Started: 2 hours ago (right after last deploy)

Diagnosis: Memory leak, likely introduced in recent deploy.

Options:
1. Restart service now (quick fix, leak will return)
2. Roll back to previous version (safer)
3. Show me the commit diff to find the leak

What do you want to do?

Real analysis. Multiple options. You're in control.

Start outside, then connect inside

STEP 01: Run a free outside check

Check a public URL first. No signup, no agent, no access to your server.

STEP 02: Connect the agent

When you want inside-server evidence, create an account and install the outbound agent on your VPS.

curl -sL https://mttrly.com/install.sh | bash -s -- -t YOUR_TOKEN

The installer creates an outbound-only agent. Review the script before running it if you want to inspect the exact changes.

STEP 03: Work with approvals

Use the dashboard, Telegram, or MCP tools to investigate, approve risky changes, and keep an audit trail.

Your server, your control

Approve where you work

Dashboard approvals use passkeys for biometric confirmation. Telegram stays available on the go. MCP and messenger approvals stay available under separate trust models.

Not raw SSH

Command execution exists as a scoped MCP action with approval and audit. The normal path is diagnostics, playbooks, and server tools, not a free terminal for AI.

BYOK — your AI, your cost

Bring your own OpenAI/Anthropic key. No markup, transparent costs. Or use our AI infrastructure ($39/mo includes AI costs).

No open ports needed

Agent connects outbound only. Your firewall stays closed. Zero attack surface.

MCP Integration

Also works from your IDE.

Connect mttrly to Claude Code, Cursor, or OpenAI Codex via the Model Context Protocol. Check alerts, run diagnostics, review evidence, and request approved actions without leaving your editor.

See all 40 tools
Claude Codeclaude mcp add mttrly --transport http https://api.mttrly.com/mcp
Cursor{ "mcpServers": { "mttrly": { "url": "https://api.mttrly.com/mcp" } } }
OpenAI Codex[mcp_servers.mttrly] url = "https://api.mttrly.com/mcp"

Running in production

204,594
automated commands executed
48.9%
incidents auto-resolved
37,276
monitoring health checks
15.6%
requiring human approval

Production metrics from internal infrastructure, March 2026.

Frequently Asked Questions

Stop being afraid of production.

Start with what the internet can see. Connect the agent when you are ready for inside-server evidence and approval-gated fixes.

Outside check needs no signup • Watchdog is free • AI features from $39/mo