Files in S3 AWS Python Lambda, how to handle them?

Do you know that Lambdas are made to process files? So, reading and writing, compressing and extracting data from s3 Bucket are the main uses. So, handling files with python lambda is really easy and helpful to handle files in s3. With s3fs package, reading and writing files in S3 becomes really easy. In this post, we’ll see how to manipulate files in memory specially when we are in a serverless lambda/function like in this post we calculate the md5 checksum.

Reading file from API Gateway

Reading a file stream is common in AWS. The API Gateway will recieve a posted file via a POST request. Then, the lambda will get it in a specific route. to save informations and put the file in a bucket.

If you want to post files more than 10M forget this method because the API Gateway is limited to 10M (See how to upload large file in S3). The event['body'] will contains the base64 encoded file content. We want to create the file data to create a file, here, we will need to ByteIO function:

import io
# Get the file content from the Event Object
file_data = event['body']
# Create a file buffer from file_data
file = io.BytesIO(file_data).read()
# Save the file in S3 Bucket
s3.put_object(Bucket="bucket_name", Key="filename", Body=file)

Reading file from S3 Event

Now, Let’s try with S3 event. The Lambda will be invoked when a file will be uploaded in the bucket. The lambda will recieve a json object. The lambda will read the file in the bucket based on informations received. We only need bucket name and the filename.

import s3fs
import magic

fs = s3fs.S3FileSystem(anon=False)

def lambda_handler(event, context):
    record = event['Records'][0]
    bucket = record['s3']['bucket']['name']
    filename = record['s3']['object']['key']
    with fs.open(f'{bucket}/{key}', 'rb') as f:
        mime = magic.from_buffer(f.read(2048), mime=True)

This function below will read the file and extract the mime-type the file, this is very helpful.

Reading and Writing Image from S3

In this case, we’ll read image from S3 and create in memory Image from the file content.

data = s3.get_object(Bucket="bucket_name", Key="filename.png")['Body'].read()
img = Image.open(BytesIO(data))

Now, the Img variable contains the image data. We can do whatever we want with it like processing and extracting data.

If you want to save the processed image on a S3 Bucket, you need to create buffer and save image on it:

buffer = BytesIO()
img.save(buffer, "png|jpg|...")
buffer.seek(0)
s3.put_object(Bucket="bucket_name", Key="filename.png", Body=buffer)

Reading and writing PDF files

In this example, I’ll use PyPDF2. This package contains two important classes the PdfFileReader and PdfFileWriter.

import PyPDF2

# Let's read a file from a Bucket
doc = s3.Object(bucket, obj)
# New we read the streaming Body with read() methode
content = doc.get()['Body'].read()
# Let's now pass the Bytes to PdfFileReader
reader = PyPDF2.PdfFileReader(BytesIO(content))

Now, we have a PDF Reader instance, we can manipulate it like a real PDF file readed from disk. We can extract text, get PDF informations, get pages number… You can check all method in this link

Now, if you want to write the PDF in the bucket using PdfFileWriter, it’s the same with images. First, we need to create a Buffer and then, let the PdfFileWriter do its job to write data on it. But first, we have to open the file in Write and Binary mode. That’s why we specified 'wb'.

# Let's create a writer instance
writer = PyPDF2.PdfFileWriter()
# Let's grab a fresh buffer
buffer = BytesIO()
# the writer will insert its data in the buffer
writer.write(buffer)
# The buffer now contains PDF file. With s3fs, we'll create the file in bucket.
with fs.open(f'{bucket}/{file_name}', 'wb') as f:
    f.write(buffer.getvalue())

That’s it, folks! don’t forget to share the post and subscribe for more contents from Kaliex

You Might Also Like
2 Comments

Leave a Reply