How to hash a file in a bucket S3 using AWS Lambda

Calculating the file hash of an uploaded file in an S3 bucket, it’s a big deal. Especially, when it passes through an API Gateway, this will encode the file to base64. We can get the ETag generated by AWS to check file integrity, but this is not a secure solution, because AWS may change hashing algorithm.

The best solution is to use S3FS package that we can install with pip. This package will give us a File System to read file by opening it. And then, we can read blocks and calculate the md5 hash or whatever algorithm you want to choose.

To install s3fs:

# Using pip
pip install s3fs

# From source
git clone git@github.com:dask/s3fs
cd s3fs
python setup.py install

This is a function that with return the md5 digest. Now, you can compare it or save it in database. Note that s3fs has already a checksum function

import s3fs

def hash_file(bucket, key):
    fs = s3fs.S3FileSystem(anon=False)
    with fs.open(f'{bucket}/{key}', 'rb') as f:
        for byte_block in iter(lambda: f.read(4096), b""):
            md5_hash.update(byte_block)
        return md5_hash.hexdigest()

This package is also available for Node JS. You need to install s3fs and crypto with npm and then:

const hashFile = (filename, bucket) => {
    const hash = crypto.createHash('md5')
    const s3fs = new S3FS(bucket)
    const stream = s3fs.createReadStream(filename)
    stream.on('data', function (data) {
      hash.update(data, 'utf8')
    })
    stream.on('error', err => {
        console.error(err)
    })
    stream.on('end', function () {
      return hash.digest('hex') // <-- this is the hash
    })
}

Official documentation

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

How to hash a file in a bucket S3 using AWS Lambda

Related

K'

You Might Also Like

CakePHP 4.0 🍓 Strawberry is out, a new Chapter for PHP dev

Why is React so popular?

Useful .zshrc aliases

2 Comments

Leave a ReplyCancel reply