This content originally appeared on DEV Community and was authored by Sandro
We’ve all been there. Your website goes down at 3 AM. MySQL crashed. NGINX stopped responding. And you’re scrambling to SSH into the server while your phone buzzes with angry customer emails.
Then someone suggests: “You should use Prometheus + Grafana + Alertmanager + PagerDuty!”
Sure. Or… hear me out… you could just use a 100-line bash script that checks your sites every minute and restarts services automatically when they fail.
The Problem with Enterprise Monitoring
Don’t get me wrong – tools like Datadog, New Relic, and Prometheus are amazing. But they’re also:
Overkill for small projects
Expensive for startups
Complex to set up and maintain
Slow to deploy (days/weeks of configuration)
Require learning new query languages and dashboards
Meanwhile, your website is still down.
Enter: The 100-Line Solution
What if monitoring could be this simple?
# 1. Add your websites
echo "https://example.com" >> sites.txt
# 2. Install
sudo ./install.sh
# 3. Done. Seriously.
That’s it. Every minute, your server now:
Checks if your websites respond
Detects if services are overwhelmed (not just down!)
Automatically restarts MySQL, NGINX, or Apache
Logs only failures (no disk space waste)
Tracks failure counts to avoid false positives
How It Works (The Smart Part)
Most monitoring tools just check if a service is “running.” That’s not enough.
Here’s what makes this script intelligent:
1. Load-Based Detection
# Don't just check if MySQL is running...
# Check if it's actually RESPONSIVE
check_mysql_health() {
# Try to ping MySQL
if timeout 3 mysqladmin ping; then
# It's alive! But is it overwhelmed?
current_connections=$(mysqladmin status | grep -oP 'Threads: \K\d+')
if [[ "$current_connections" -gt 150 ]]; then
# Too many connections - restart before it crashes
return 1
fi
fi
}
Your site can be down even when services show as “running” – when they’re overloaded with traffic or locked up processing queries.
2. Advanced Health Checks
# NGINX example: Test config + connectivity + load
check_nginx_health() {
# 1. Validate config before trying to use it
nginx -t 2>/dev/null || return 1
# 2. Can it accept connections?
timeout 2 bash -c "echo > /dev/tcp/localhost/80" || return 1
# 3. Is it drowning in connections?
active_conn=$(curl -s http://localhost/nginx_status | grep -oP 'Active connections: \K\d+')
[[ "$active_conn" -gt 1000 ]] && return 1
return 0 # All good!
}
3. Smart Recovery Logic
# Only restart after 3 consecutive failures (avoid false positives)
if [[ "$current_failures" -ge 3 ]]; then
# Restart services in order: Database first, then web server
for service in "${SERVICES[@]}"; do
systemctl restart "$service"
done
fi
Real-World Example
Let’s say your e-commerce site suddenly gets featured on Reddit (congrats! ). Traffic spikes 10x:
Traditional Monitoring:
Dashboards show high CPU/memory
Alerts fire
You get paged
You wake up, investigate, manually restart services
Lost sales during downtime
This Script:
Detects MySQL has 200 active connections (threshold: 150)
Automatically restarts MySQL in 3 seconds
Logs:
"MySQL OVERLOADED (200 connections) - restarted"
You stay asleep
Sales continue
Installation (Seriously, It’s This Easy)
# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/site-monitor.git
cd site-monitor
# 2. Add your websites
cat > sites.txt << EOF
https://example.com
https://api.example.com
https://www.example.com
EOF
# 3. Optional: Customize thresholds
vim config.conf # Adjust MySQL/NGINX/Apache thresholds
# 4. Install (creates cron job, sets up logging)
sudo ./install.sh
# 5. Watch it work
sudo tail -f /var/log/site-monitor/monitor.log
Output:
[2025-10-20 14:23:45] FAILURE: https://example.com - HTTP 000 (1/3 failures)
[2025-10-20 14:24:45] FAILURE: https://example.com - HTTP 000 (2/3 failures)
[2025-10-20 14:25:45] FAILURE: https://example.com - HTTP 000 (3/3 failures)
[2025-10-20 14:25:46] RECOVERY: Starting recovery for https://example.com
[2025-10-20 14:25:47] RECOVERY: MySQL OVERLOADED (187 connections) - restarted
[2025-10-20 14:25:49] RECOVERY: NGINX responsive - no action needed
[2025-10-20 14:25:50] RECOVERY: Recovery completed
[2025-10-20 14:26:45] SUCCESS: https://example.com back online (HTTP 200)
Configuration Options
Everything is configurable in config.conf
:
# HTTP Settings
TIMEOUT=10 # Request timeout
FAILURE_THRESHOLD=3 # Failures before recovery
# Services to manage (in order)
SERVICES=("mysql" "nginx") # Or: ("mysql" "apache2")
# Load Thresholds
MYSQL_MAX_CONNECTIONS=150 # Restart if connections exceed this
NGINX_MAX_CONNECTIONS=1000 # Restart if connections exceed this
APACHE_MAX_WORKERS=150 # Restart if busy workers exceed this
# Logging
LOG_SUCCESS=false # Only log failures (save disk space)
When to Use This vs. Enterprise Tools
Use This Simple Script When:
You have < 50 websites to monitor
You’re on a budget (it’s free!)
You need it deployed TODAY
You manage your own Ubuntu servers
You want to understand what’s happening (no black box)
Use Enterprise Tools When:
You need fancy dashboards and metrics
You have distributed microservices
You have a dedicated DevOps team
You need compliance/audit trails
You need integration with 50+ other tools
Performance & Resource Usage
This script is incredibly lightweight:
- CPU: Near zero (runs for ~1 second per minute)
- Memory: ~5MB
- Disk: <1MB logs per month (with default settings)
- Network: One HTTP GET per site per minute
Compare that to running Prometheus + Grafana (hundreds of MB of RAM).
Production-Ready Features
Don’t let the simplicity fool you – this runs in production:
State Tracking: Counts consecutive failures per site
Log Rotation: Yearly rotation via logrotate
Error Handling: Graceful failures, timeout protection
No Dependencies: Just bash + curl + systemctl (already on Ubuntu)
Tested: Works on Ubuntu 22.04 LTS
Advanced Use Cases
Multi-Server Deployment
Deploy to multiple servers with different site lists:
# Server 1: Monitor frontend sites
echo "https://app.example.com" > sites.txt
# Server 2: Monitor API endpoints
echo "https://api.example.com" > sites.txt
# Server 3: Monitor admin tools
echo "https://admin.example.com" > sites.txt
Custom Services
Not just MySQL/NGINX! Add any systemd service:
# Add Redis, PHP-FPM, whatever you need
SERVICES=("mysql" "nginx" "redis-server" "php8.1-fpm")
Integration with Existing Tools
Still want Slack notifications? Just add a webhook:
# In monitor.sh, add after line 320:
curl -X POST "YOUR_SLACK_WEBHOOK" \
-d "{\"text\":\"🚨 $url is down! Auto-recovering...\"}"
The Philosophy: Simple > Complex
This project follows the Unix philosophy:
- Do one thing well
- Use plain text for data
- Build small, composable tools
Your monitoring doesn’t need to be fancy. It needs to:
-
Detect failures
-
Fix them automatically
-
Tell you what happened
Mission accomplished in 100 lines of bash.
Try It Yourself
The code is open source (MIT License):
GitHub: https://github.com/sgumz/site-monitor
Installation takes 2 minutes. Give it a try!
Closing Thoughts
Sometimes the best solution isn’t the one with the most features – it’s the one that solves your problem today without creating new ones.
Could this bash script replace Datadog for a Fortune 500 company? No.
Could it save your small SaaS business from 3 AM wake-up calls? Absolutely.
What’s your take? Do you prefer simple scripts or enterprise monitoring? Any horror stories about over-engineered solutions? Drop a comment below!
This content originally appeared on DEV Community and was authored by Sandro