How is this different from SSH clients like Termius?

SSH clients give you a terminal. You still need to know what to check, which command is safe, and what changed. mttrly works as a guarded incident-response layer: it gathers server evidence through scoped tools, explains the likely cause, asks before risky actions, and records the result in the audit trail.

Does it give AI a raw terminal?

No. The normal path is diagnostics, playbooks, and scoped MCP tools. If a one-off command is needed, it goes through preview, explicit approval, and audit. Read-only checks can run quickly; state-changing work waits for you.

What is the difference between Watchdog and AI features?

Watchdog is the free starting point: public checks, alerts, and manual controls for one server. Paid AI features add natural-language diagnostics, multi-step reasoning, MCP access, and guided remediation options for live incidents.

Can it help with deploys?

Yes, when you configure a deploy profile. Deploy support is treated as one guarded workflow, not the whole product: mttrly can run pre-checks, approved steps, post-checks, and rollback paths for supported modes. Risky work still waits for approval.

What if mttrly suggests the wrong fix?

You stay in control. mttrly separates read-only evidence gathering from change operations, shows risk before sensitive actions, waits for explicit approval, validates common operations before and after execution, and records what happened.

Do I need Prometheus, Grafana, or Kubernetes?

No. mttrly is built for common VPS setups, small teams, and indie products. It also complements observability tools: monitoring detects symptoms, while mttrly helps with controlled investigation, approval-gated remediation, verification, and audit after the alert.

Guarded incident response for live servers

Your server broke.
mttrly shows why._

/ mät·ter·ly /

It watches your VPS, gathers evidence, and proposes the next fix. Risky actions wait for your approval; every step lands in the audit log.

Start Watchdog free

Outside check needs no signup. Connect an agent when you're ready to see inside the server.

● Free watchdog tier • AI features from $39/mo

>_ MTTRLY // INCIDENT

api.myapp.com is throwing 502s

Checked nginx, the app process, disk, and recent deploy markers. The API process was OOM-killed 8 min ago; RAM pressure is 94%. Options: 1. Show failing logs 2. Restart the API process 3. Keep watching and alert again Approve restart?

Yes

Restarted. Latency is back to normal. Watching for 5 min to confirm. Nothing else touched - full action is in your audit log.

Outside check

Start without signup

Check DNS, TLS, reachability, and public symptoms before you open SSH.

Evidence

See what changed

When the agent is connected, mttrly reads live server facts instead of guessing from a prompt.

Approval

You choose the fix

Risky restarts, command actions, and deploy work wait for explicit approval.

Audit

Every step recorded

Requests, approvals, actions, and results stay visible after the incident is over.

Sound familiar?

01_PANIC

"It worked on my machine"

You shipped a change and production went blank. SSH opens to a wall of logs, the useful line is buried, and users are already noticing.

02_LOST

"What does this error even mean?"

ChatGPT can explain the error string. It cannot see your nginx status, process memory, disk pressure, or the exact service that is down.

03_STUCK

"One wrong command and it's over"

You probably need to restart something. But which process? What else changes? mttrly keeps the next action bounded and waits for your call.

What mttrly actually does

It turns a live-server incident into a controlled loop: watch, diagnose, approve, verify.

→Sees your actual server state, not just a pasted log line
→Explains the likely cause in plain English
→Keeps risky actions behind approval and audit

The incident loop:

1. Watches & catches

Watchdog checks public and connected-server signals, then routes the symptom to the dashboard, Telegram, or MCP.

2. Diagnoses with evidence

mttrly checks processes, logs, ports, disk, memory, and recent change markers before it explains the likely root cause.

3. Fixes under approval

It proposes bounded next steps. Restarts, command actions, and deploy work wait for your explicit approval and leave an audit trail.

You stay in control. mttrly does the legwork.

How the AI works

Not a chatbot. A reasoning loop that checks real server state and keeps risky actions gated.

ALERTAnomaly detected

Watchdog or an outside check catches the symptom and routes it to your workspace.

TRIAGEClassify in <1s

A fast model classifies the incident type and chooses the next diagnostic recipe.

DIAGNOSEEvidence loop

Scoped tools read logs, services, ports, disk, memory, and recent change markers.

ROOT CAUSEPattern identified

The model correlates the facts, explains the likely cause, and scores confidence.

PLANBounded options

mttrly lists next steps with risk levels instead of jumping straight to a terminal command.

YOUR CALLHuman approval

Sensitive actions wait for approval from the dashboard, Telegram, or an MCP/IDE flow.

EXECUTEConfirmed fix

The agent runs only the approved action, then reads the new state to verify recovery.

SUMMARYIncident report

You get what happened, what changed, and what to prevent next time.

Multi-step reasoning per incident · scoped server tools · approvals and audit log

Old way vs with mttrly

"Is my app running?"

Old way:

SSH → systemctl status → docker ps → check processes Takes 5 minutes. Need to know what to check.

With mttrly:

You ask once. mttrly checks the app process, nginx, database connections, and resource usage, then returns the status in plain English.

"Why is it slow?"

Old way:

Check htop, tail logs, compare recent changes, and hope you remember which service normally uses this much RAM.

With mttrly:

It checks RAM, disk, CPU, network, logs, and recent deploy markers, then gives you the likely bottleneck plus clear options.

"Show me the errors"

Old way:

journalctl -u app -n 1000 | grep ERROR Then scroll through hundreds of lines and guess which ones matter.

With mttrly:

It groups repeated errors, points at the first new pattern, correlates it with recent changes, and suggests the safest next step.

"Can I deploy safely?"

Old way:

Run the deploy checklist by hand: pull, install, migrate, build, restart, verify. Miss one step and production pays.

With mttrly:

Deploy profiles can run pre-checks, guarded steps, post-checks, and rollback paths for supported modes. You still approve the risky parts.

Real scenarios

See how diagnostic recipes work in action

Scenario 1: "My site is down"

My site is down

Running diagnostics...

✓ Server is reachable (ping 34ms)
✓ Port 80 open
✓ Port 443 open
✗ Nginx not responding

Checking nginx status...
✗ Nginx process not running

Reading nginx error log...
Found: "nginx: [emerg] unexpected '}' in /etc/nginx/sites-enabled/default:47"

Issue found: Nginx config has syntax error at line 47.

Fix the typo, then I can reload nginx for you.
Or: want me to restore yesterday's working config?

30 seconds from "site is down" to knowing exactly what to fix.

Scenario 2: "Everything is slow"

Why is everything so slow?

Let me check...

CPU: 23% (normal)
Disk: 45% used (normal)
RAM: 94% used ⚠️

Found the issue: node.js process consuming 3.2GB RAM

Checking logs for memory-related errors...
Found: 127 'JavaScript heap out of memory' warnings
Started: 2 hours ago (right after last deploy)

Diagnosis: Memory leak, likely introduced in recent deploy.

Options:
1. Restart service now (quick fix, leak will return)
2. Roll back to previous version (safer)
3. Show me the commit diff to find the leak

What do you want to do?

Real analysis. Multiple options. You're in control.

Start outside, then connect inside

STEP 01: Run a free outside check

Check a public URL first. No signup, no agent, no access to your server.

STEP 02: Connect the agent

When you want inside-server evidence, create an account and install the outbound agent on your VPS.

curl -sL https://mttrly.com/install.sh | bash -s -- -t YOUR_TOKEN

The installer creates an outbound-only agent. Review the script before running it if you want to inspect the exact changes.

STEP 03: Work with approvals

Use the dashboard, Telegram, or MCP tools to investigate, approve risky changes, and keep an audit trail.

Your server, your control

✓ Approve where you work

Dashboard approvals use passkeys for biometric confirmation. Telegram stays available on the go. MCP and messenger approvals stay available under separate trust models.

✓ Not raw SSH

Command execution exists as a scoped MCP action with approval and audit. The normal path is diagnostics, playbooks, and server tools, not a free terminal for AI.

✓ BYOK — your AI, your cost

Bring your own OpenAI/Anthropic key. No markup, transparent costs. Or use our AI infrastructure ($39/mo includes AI costs).

✓ No open ports needed

Agent connects outbound only. Your firewall stays closed. Zero attack surface.

MCP Integration

Also works from your IDE.

Connect mttrly to Claude Code, Cursor, or OpenAI Codex via the Model Context Protocol. Check alerts, run diagnostics, review evidence, and request approved actions without leaving your editor.

See all 40 tools

Claude Codeclaude mcp add mttrly --transport http https://api.mttrly.com/mcp

Cursor{ "mcpServers": { "mttrly": { "url": "https://api.mttrly.com/mcp" } } }

OpenAI Codex[mcp_servers.mttrly] url = "https://api.mttrly.com/mcp"

Running in production

204,594

automated commands executed

48.9%

incidents auto-resolved

37,276

monitoring health checks

15.6%

requiring human approval

Production metrics from internal infrastructure, March 2026.

Fresh From The Blog

Real incidents, fixes, and recovery notes from production.

Short, practical write-ups from the exact kind of server drama people search for when something breaks at the worst possible moment.

Browse all posts

vibe-codingai-agentsdevops

I kept telling my AI to stop using SSH. Here's what it found instead.

Claude Code had full SSH access to my server. Every time it used it, I made it switch to the monitoring bot. The difference in what it saw wasn't what I expected.

April 3, 20265 min read

monitoringon-calldevops

Alert fatigue almost made me turn off my own monitoring.

My monitoring sent an alert. Healthcheck said all good. Services said all running. Someone was lying — and it took me an hour to find out who.

March 31, 20265 min read

vibe-codingdevopsai-agents

I put an AI agent on my server. It quietly deleted my own feature.

I wanted autonomous server management. What I got was a lesson in why AI agents need a confirmation step before touching production.

March 31, 20265 min read

Frequently Asked Questions

Stop being afraid of production.

Start with what the internet can see. Connect the agent when you are ready for inside-server evidence and approval-gated fixes.

Start Watchdog free

Outside check needs no signup • Watchdog is free • AI features from $39/mo

Your server broke.mttrly shows why._

Start without signup

See what changed

You choose the fix

Every step recorded

Sound familiar?

"It worked on my machine"

"What does this error even mean?"

"One wrong command and it's over"

What mttrly actually does

The incident loop:

1. Watches & catches

2. Diagnoses with evidence

3. Fixes under approval

How the AI works

Old way vs with mttrly

"Is my app running?"

Old way:

With mttrly:

"Why is it slow?"

Old way:

With mttrly:

"Show me the errors"

Old way:

With mttrly:

"Can I deploy safely?"

Old way:

With mttrly:

Real scenarios

Scenario 1: "My site is down"

Scenario 2: "Everything is slow"

Start outside, then connect inside

STEP 01: Run a free outside check

STEP 02: Connect the agent

STEP 03: Work with approvals

Your server, your control

✓ Approve where you work

✓ Not raw SSH

✓ BYOK — your AI, your cost

✓ No open ports needed

Also works from your IDE.

Running in production

Real incidents, fixes, and recovery notes from production.

I kept telling my AI to stop using SSH. Here's what it found instead.

Alert fatigue almost made me turn off my own monitoring.

I put an AI agent on my server. It quietly deleted my own feature.

Frequently Asked Questions

Stop being afraid of production.

Your server broke.
mttrly shows why._