Monitoring EC2 Disk Space with a Simple Bash Script and Slack Alerts

Introduction

In the cloud infrastructure landscape, monitoring the health and resources of EC2 instances is essential. One common challenge for system administrators is ensuring that disk space doesn’t run out, which can lead to performance degradation or service outages. While AWS provides robust monitoring tools like CloudWatch, they can incur additional costs and complexity, especially for smaller or less complex setups.

In this article, we’ll explore a simple and cost-effective solution using a Bash script that runs as a cron job on your EC2 instances. This script will check the disk usage on the instance, and if it exceeds a defined threshold, it will send an alert to a Slack channel. This method requires minimal setup, no external monitoring tools, and keeps costs low.

Why Do You Need Disk Monitoring?

Disk space issues are often overlooked until they lead to problems such as:

Application Failures: Applications or services may fail when there’s insufficient disk space for logging, writing data, or temporary files.
Performance Degradation: Low disk space can cause system slowness, especially for services that rely heavily on read/write operations.
Downtime: Critical systems may crash if essential processes can’t operate due to full disks.

Monitoring Disk Space Proactively

Monitoring disk usage helps you avoid such issues by:

Receiving early warnings when disk usage crosses a certain threshold.
Taking preventative actions (like extending the disk size or cleaning up old data) before it becomes critical.

While AWS CloudWatch offers a way to monitor custom metrics like disk space, it incurs additional costs and requires setup and integration with your EC2 instances. A Bash script with a Slack webhook offers a lightweight and low-cost alternative.

The Bash Script Solution

Here is the solution using a Bash script and a Slack webhook. The script checks the disk usage of the root volume (/) and sends an alert to a Slack channel if the usage exceeds a defined threshold. It also includes information like the EC2 instance’s hostname, private IP, and public IP to help quickly identify the affected instance.

#!/bin/bash

THRESHOLD=85  # Define the disk usage threshold (in %)
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/your/webhook/url"

# Get EC2 metadata for hostname, public IP, and private IP
hostname=$(curl -s http://169.254.169.254/latest/meta-data/local-hostname)
public_ip=$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
private_ip=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)

# Check disk space usage for root directory
disk_usage=$(df -h / | grep -v Filesystem | awk '{print $5}' | sed 's/%//')

if [ "$disk_usage" -ge "$THRESHOLD" ]; then
  message="Disk usage on EC2 instance '${hostname}' is at ${disk_usage}%.
  \nPrivate IP: ${private_ip}
  \nPublic IP: ${public_ip}.
  \nPlease extend disk space!"
  
  payload="{'text': '$message'}"
  curl -X POST -H 'Content-type: application/json' --data "$payload" $SLACK_WEBHOOK_URL
fi

Slack Webhook Setup

In your Slack workspace, navigate to the Slack API page and create an incoming webhook for the desired channel.
Replace SLACK_WEBHOOK_URL in the script with the generated webhook URL.

Pros and Cons of This Approach

Pros:

Cost-Effective:
- There is no need for additional AWS services like CloudWatch, which can incur costs for custom metrics. The only cost involved is the EC2 instance itself and the minimal bandwidth for sending Slack notifications.
Simple and Lightweight:
- The solution uses native tools (Bash, curl, cron) available on almost all Linux-based systems. There’s no need to install additional monitoring agents.
Immediate Notifications:
- Using Slack as the notification channel ensures that your team receives alerts in real-time without needing to log in to AWS or set up more complex monitoring dashboards.
Customizable:
- You can easily customize the script to check different thresholds, monitor additional directories, or modify the frequency of checks.
Low Overhead:
- Running a lightweight cron job has minimal impact on the system’s performance. This is especially useful for smaller EC2 instances.

Cons:

No Centralized Monitoring:
- Since the script runs on individual EC2 instances, there’s no centralized dashboard for viewing the disk usage across all instances. You only get notified when a threshold is breached, not before.
Maintenance:
- If you have many EC2 instances, you’ll need to deploy and maintain this script across all of them. In contrast, centralized solutions like AWS CloudWatch provide an aggregated view with minimal per-instance configuration.
Limited to Disk Monitoring:
- This script only monitors disk usage, whereas services like CloudWatch can monitor multiple aspects of EC2 health, such as CPU, memory, and network traffic.
Requires Slack Integration:
- If your organization doesn’t use Slack, you’ll need to integrate the script with another messaging platform, or use email for notifications.

Conclusion

This Bash script offers a simple, low-cost solution to monitor EC2 disk space and send alerts to Slack when disk usage exceeds a critical threshold. It’s an excellent fit for organizations or projects that don’t need the complexity and costs associated with AWS CloudWatch or other comprehensive monitoring tools.

However, it’s important to weigh the simplicity and cost savings against the limitations of this approach, especially when it comes to centralized management and broader system health monitoring. For small to mid-size deployments or cost-conscious environments, this can be a highly effective method to ensure your EC2 instances are not running out of disk space.

Would you like to give this solution a try or explore more comprehensive alternatives? Let me know in the comments!

Let me know if you need any more changes or additions to this article!

Share on Social Media

AWS