Stop Over-Engineering: A 100-line bash script that saved my servers – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Sandro

We’ve all been there. Your website goes down at 3 AM. MySQL crashed. NGINX stopped responding. And you’re scrambling to SSH into the server while your phone buzzes with angry customer emails.

Then someone suggests: “You should use Prometheus + Grafana + Alertmanager + PagerDuty!”

Sure. Or… hear me out… you could just use a 100-line bash script that checks your sites every minute and restarts services automatically when they fail.

The Problem with Enterprise Monitoring

Don’t get me wrong – tools like Datadog, New Relic, and Prometheus are amazing. But they’re also:

Overkill for small projects
Expensive for startups
Complex to set up and maintain
Slow to deploy (days/weeks of configuration)
Require learning new query languages and dashboards

Meanwhile, your website is still down.

Enter: The 100-Line Solution

What if monitoring could be this simple?

# 1. Add your websites
echo "https://example.com" >> sites.txt

# 2. Install
sudo ./install.sh

# 3. Done. Seriously.

That’s it. Every minute, your server now:

Checks if your websites respond
Detects if services are overwhelmed (not just down!)
Automatically restarts MySQL, NGINX, or Apache
Logs only failures (no disk space waste)
Tracks failure counts to avoid false positives

How It Works (The Smart Part)

Most monitoring tools just check if a service is “running.” That’s not enough.

Here’s what makes this script intelligent:

1. Load-Based Detection

# Don't just check if MySQL is running...
# Check if it's actually RESPONSIVE
check_mysql_health() {
    # Try to ping MySQL
    if timeout 3 mysqladmin ping; then
        # It's alive! But is it overwhelmed?
        current_connections=$(mysqladmin status | grep -oP 'Threads: \K\d+')

        if [[ "$current_connections" -gt 150 ]]; then
            # Too many connections - restart before it crashes
            return 1
        fi
    fi
}

Your site can be down even when services show as “running” – when they’re overloaded with traffic or locked up processing queries.

2. Advanced Health Checks

# NGINX example: Test config + connectivity + load
check_nginx_health() {
    # 1. Validate config before trying to use it
    nginx -t 2>/dev/null || return 1

    # 2. Can it accept connections?
    timeout 2 bash -c "echo > /dev/tcp/localhost/80" || return 1

    # 3. Is it drowning in connections?
    active_conn=$(curl -s http://localhost/nginx_status | grep -oP 'Active connections: \K\d+')
    [[ "$active_conn" -gt 1000 ]] && return 1

    return 0  # All good!
}

3. Smart Recovery Logic

# Only restart after 3 consecutive failures (avoid false positives)
if [[ "$current_failures" -ge 3 ]]; then
    # Restart services in order: Database first, then web server
    for service in "${SERVICES[@]}"; do
        systemctl restart "$service"
    done
fi

Real-World Example

Let’s say your e-commerce site suddenly gets featured on Reddit (congrats! ). Traffic spikes 10x:

Traditional Monitoring:

Dashboards show high CPU/memory
Alerts fire
You get paged
You wake up, investigate, manually restart services
Lost sales during downtime

This Script:

Detects MySQL has 200 active connections (threshold: 150)
Automatically restarts MySQL in 3 seconds
Logs: "MySQL OVERLOADED (200 connections) - restarted"
You stay asleep
Sales continue

Installation (Seriously, It’s This Easy)

# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/site-monitor.git
cd site-monitor

# 2. Add your websites
cat > sites.txt << EOF
https://example.com
https://api.example.com
https://www.example.com
EOF

# 3. Optional: Customize thresholds
vim config.conf  # Adjust MySQL/NGINX/Apache thresholds

# 4. Install (creates cron job, sets up logging)
sudo ./install.sh

# 5. Watch it work
sudo tail -f /var/log/site-monitor/monitor.log

Output:

[2025-10-20 14:23:45] FAILURE: https://example.com - HTTP 000 (1/3 failures)
[2025-10-20 14:24:45] FAILURE: https://example.com - HTTP 000 (2/3 failures)
[2025-10-20 14:25:45] FAILURE: https://example.com - HTTP 000 (3/3 failures)
[2025-10-20 14:25:46] RECOVERY: Starting recovery for https://example.com
[2025-10-20 14:25:47] RECOVERY: MySQL OVERLOADED (187 connections) - restarted
[2025-10-20 14:25:49] RECOVERY: NGINX responsive - no action needed
[2025-10-20 14:25:50] RECOVERY: Recovery completed
[2025-10-20 14:26:45] SUCCESS: https://example.com back online (HTTP 200)

Configuration Options

Everything is configurable in config.conf:

# HTTP Settings
TIMEOUT=10                    # Request timeout
FAILURE_THRESHOLD=3           # Failures before recovery

# Services to manage (in order)
SERVICES=("mysql" "nginx")    # Or: ("mysql" "apache2")

# Load Thresholds
MYSQL_MAX_CONNECTIONS=150     # Restart if connections exceed this
NGINX_MAX_CONNECTIONS=1000    # Restart if connections exceed this
APACHE_MAX_WORKERS=150        # Restart if busy workers exceed this

# Logging
LOG_SUCCESS=false             # Only log failures (save disk space)

When to Use This vs. Enterprise Tools

Use This Simple Script When:

You have < 50 websites to monitor
You’re on a budget (it’s free!)
You need it deployed TODAY
You manage your own Ubuntu servers
You want to understand what’s happening (no black box)

Use Enterprise Tools When:

You need fancy dashboards and metrics
You have distributed microservices
You have a dedicated DevOps team
You need compliance/audit trails
You need integration with 50+ other tools

Performance & Resource Usage

This script is incredibly lightweight:

CPU: Near zero (runs for ~1 second per minute)
Memory: ~5MB
Disk: <1MB logs per month (with default settings)
Network: One HTTP GET per site per minute

Compare that to running Prometheus + Grafana (hundreds of MB of RAM).

Production-Ready Features

Don’t let the simplicity fool you – this runs in production:

State Tracking: Counts consecutive failures per site
Log Rotation: Yearly rotation via logrotate
Error Handling: Graceful failures, timeout protection
No Dependencies: Just bash + curl + systemctl (already on Ubuntu)
Tested: Works on Ubuntu 22.04 LTS

Advanced Use Cases

Multi-Server Deployment

Deploy to multiple servers with different site lists:

# Server 1: Monitor frontend sites
echo "https://app.example.com" > sites.txt

# Server 2: Monitor API endpoints
echo "https://api.example.com" > sites.txt

# Server 3: Monitor admin tools
echo "https://admin.example.com" > sites.txt

Custom Services

Not just MySQL/NGINX! Add any systemd service:

# Add Redis, PHP-FPM, whatever you need
SERVICES=("mysql" "nginx" "redis-server" "php8.1-fpm")

Integration with Existing Tools

Still want Slack notifications? Just add a webhook:

# In monitor.sh, add after line 320:
curl -X POST "YOUR_SLACK_WEBHOOK" \
  -d "{\"text\":\"🚨 $url is down! Auto-recovering...\"}"

The Philosophy: Simple > Complex

This project follows the Unix philosophy:

Do one thing well
Use plain text for data
Build small, composable tools

Your monitoring doesn’t need to be fancy. It needs to:

Detect failures
Fix them automatically
Tell you what happened

Mission accomplished in 100 lines of bash.

Try It Yourself

The code is open source (MIT License):

GitHub: https://github.com/sgumz/site-monitor

Installation takes 2 minutes. Give it a try!

Closing Thoughts

Sometimes the best solution isn’t the one with the most features – it’s the one that solves your problem today without creating new ones.

Could this bash script replace Datadog for a Fortune 500 company? No.

Could it save your small SaaS business from 3 AM wake-up calls? Absolutely.

What’s your take? Do you prefer simple scripts or enterprise monitoring? Any horror stories about over-engineered solutions? Drop a comment below!

This content originally appeared on DEV Community and was authored by Sandro