Cutting AWS Expenses: Streamlining EBS Snapshot Cleanup with the Power of AWS Lambda!

In this post, I will walk through a cost optimization project involving AWS Lambda with AWS SDK for Python i.e. boto3, where we automate the cleanup of stale EBS snapshots. By doing so, we can effectively optimize storage costs while ensuring that we only retain necessary snapshots.

Problem Statement:
🔹Accumulating EBS snapshots over time can lead to increased storage costs and unnecessary resource utilization.

Solution Overview:
🔹AWS Lambda function that automates the cleanup of the EBS Snapshots process for us in a click.

Workflow:
🔹 Created a Lambda function that identifies EBS snapshots that are no longer associated with any active EC2 instance and deletes them to save on storage costs.
🔹 The Lambda function fetches all EBS snapshots owned by the same account ('self') and also retrieves a list of active EC2 instances (running and stopped).
🔹 For each snapshot, it checks if the associated volume (if exists) is not associated with any active instance. If it finds a stale snapshot, it deletes it, effectively optimizing storage costs.

Let's deep dive into this simple project.

We want to list the instances, list the volumes and delete the snapshots that are not associated with any volume. If you forgot that the snapshot should also be deleted as you have hundreds of snapshots but you forgot to delete all the snapshots, you just deleted the instance and automatically the volume got deleted but the snapshot stayed back.

So in such case, we will use this Lambda function. We are going to write a Lambda function that will help to do this activity for us.

So I'll keep the instance as it is. Firstly we will write a Lambda function and I'll show you that it does not delete snapshots that are attached to volumes that are attached to EC2 instances

We will see that it will not delete Snapshot that belongs to a volume and that volume is assigned to the EC2 instance.

So I have an EC2 instance CloudWatch-Demo that has default volume attached.

Also, I have two snaps. One of which is associated with the latest EC2 instance and one is stale i.e. it is there for a few days.

So, snap1 is stale as it is not associated with any volume.

Also, let's create a snapshot that is associated with the active instance and volume with the following details.

Using my active volume I have created a snapshot as snap2.

Now, let's create an AWS Lambda function.

Click on Create a function.

Description:

Another from scratch

Function name: cost-optimization-ebs-snapshot

Runtime: Python 3.11 (whichever version available)

Click on the Create function.

Function with name cost-optimization-ebs-snapshot is created.

Now, scroll down and you will see the Code section in which you just paste the boto3 script provided. If you know a little bit of Python you can understand the self-explanatory code.

By default it will look like

I am pasting the script here which is created by Abhishek Veeramalla

ebs_stale_snapshosts.py You can use it for demo purposes.

Once you paste the script, click on the Deploy button and save it by clicking ctrl S. Now, click on Test. The following window will pop up to create an event.

Same thing we can do by CloudWatch where we do not need to create a test as it is managed by CloudWatch, but for now we will go with manual triggering of the test.

Provide the following details to create the test.

Now, if you click on Test, the script will try to execute, but I am getting the following errors as we don't have any permissions to execute the script.

Now, click on the Configuration section where you will get the default Role created for this function to execute.

Click on the Role name which will ultimately be redirected to the new window.

Here we need to provide the Permissions to this role to execute the script. As of now, only one permission is attached to it by default.

But as we don't have any default policy to delete the snapshots we need to create one to allow the permissions to list and delete the Snapshots.

Go to IAM --> Policies

Click on Create policy.

Click on EC2.

In the search bar type snapshot and tick the checkboxes as DescribeSnapshot and DeleteSnapshot.

DescribeSnapshot is required to list all the available snapshots.

DeleteSnapshot is required to delete the snapshot

Now, search volume in the search bar and tick DecribeVolumes.

DecribeVolumes is required to list all the available volumes.

Now, search instances in the search bar and tick DecribeInstances.

DecribeInstances is required to list all the available instances.

So, in total, I have provided 4 permissions, 3 to List and 1 to write.

Click on next.

Provide the suitable policy name. I have provided cost-optimization-ebs-snapshot-policy.

Review and click on Create Policy.

Now, once again go to your lambda function default role.

click on Add Permissions --> Attach policies

In the search bar search by the name of the policy that you just created and Click on Add Permissions.

You can review the attached policies.

Now if you once again run the Test

So, one more setting we need to do is to increase the TimeOut time so that our test can get executed even though it takes more time. By default, it is 3 seconds.

Go to Configuration --> Edit --> Timeout.

Now, if you click on Test on your boto3 script.

Your script will get executed giving the expected response as follows.

Function Logs show:

Deleted EBS snapshot ---- as its not associated volume was not found.

If you check your Snapshot dashboard, we can see the snapshot snap1 which was not associated with any volume is deleted and the latest snapshot with active volume and instance is not deleted.

However, you can modify the test cases as per requirement and parameters like if any snapshot/volume is older than 30 days, delete that.


In the next blog post, we will explore more advanced topics in the realm of DevOps*, so stay tuned and let me know if there is any correction.*