In today’s digital world, images and videos often contain metadata that reveals a surprising amount of information about the media file. This metadata, such as EXIF data in images, can include sensitive details like location, device information, and more. To protect user privacy and enhance security, businesses in various industries can benefit from removing this metadata from media files. In this blog post, we’ll walk you through a simple AWS Lambda script that automatically removes metadata from uploaded images and videos in S3 buckets.
Industries That Can Benefit:
- Social Media Platforms: Social media platforms handle a massive number of media uploads every day. By removing metadata from images and videos, these platforms can better protect user privacy and minimize the risk of unintentional information leaks.
- E-Commerce: E-commerce websites often display user-generated content, such as product images and reviews. Stripping metadata from these media files ensures that customers’ private information is not inadvertently exposed.
- Healthcare: The healthcare industry deals with sensitive patient information, including images and videos from medical procedures. Removing metadata from these files is essential to comply with privacy regulations and protect patient confidentiality.
- News and Media: Journalists and media organizations publish images and videos that may contain sensitive information about sources or locations. Stripping metadata can help protect this information and maintain the integrity of their reporting.
- Education: Educational institutions often host and share various media files, such as lecture videos, research images, and student presentations. Removing metadata from these files ensures that private information about students, faculty, and research subjects is protected.
Benefits of Removing Metadata:
- Enhanced Privacy: Stripping metadata from media files helps protect sensitive information about users, locations, and devices, safeguarding user privacy.
- Security: By removing metadata, you reduce the risk of accidentally leaking sensitive information, which could be exploited by malicious actors.
- Compliance: Removing metadata can help organizations comply with data protection regulations, such as GDPR or HIPAA, that require safeguarding user data.
- Simplified Management: Automating metadata removal with AWS Lambda reduces the manual work needed to process media files, streamlining media management across your organization.
import boto3
import io
import os
from PIL import Image
from moviepy.editor import *
def lambda_handler(event, _):
bucket_name = os.environ['S3_BUCKET_NAME']
s3 = boto3.client('s3')
object_name = event['Records'][0]['s3']['object']['key']
file_name, file_extension = os.path.splitext(object_name)
supported_image_extensions = ['.jpg', '.jpeg', '.png', '.tiff', '.tif', '.heic', '.heif']
supported_video_extensions = ['.mp4', '.mov', '.avi', '.mkv', '.webm']
image_data = s3.get_object(Bucket=bucket_name, Key=object_name)
if file_extension.lower() in supported_image_extensions:
with io.BytesIO(image_data['Body'].read()) as image_file:
image = Image.open(image_file)
image_format = image.format
with io.BytesIO() as new_image_data:
image.save(new_image_data, format=image_format)
new_image_data.seek(0)
s3.put_object(Bucket=bucket_name, Key=object_name, Body=new_image_data, Tagging='ExifDeleted=True')
elif file_extension.lower() in supported_video_extensions:
with io.BytesIO(image_data['Body'].read()) as video_file:
video = VideoFileClip(video_file)
with io.BytesIO() as new_video_data:
video.write_videofile(new_video_data, codec='libx264', audio_codec='aac')
new_video_data.seek(0)
s3.put_object(Bucket=bucket_name, Key=object_name, Body=new_video_data, Tagging='ExifDeleted=True')
Please note that the PIL
and moviepy
libraries are requires some shared libraries, which may not be available in the default Lambda environment. You’ll need to create a custom Lambda layer that includes both shared libraries. You can follow the official guide to create a custom Lambda layer for FFmpeg.
Here is the Github Repository: https://github.com/flightlesstux/EXIF-Metadata-Remover
Conclusion
The AWS Lambda script we’ve provided makes it easy to remove metadata from images and videos uploaded to S3 buckets, enhancing privacy and security across a wide range of industries. By implementing this solution, you can protect user information, reduce potential risks, and ensure compliance with data protection regulations.