JPEG画像アップロードのAPIを作りました。Pillowを使って、画像を処理したらAWS s3に保存。aiobotocore
を使ってs3に保存した。
…日本語はまだ無理です。英語で続きます…
So, I built an HTTP for uploading JPEG images.
The images are read by the API, processed by Python Pillow, and then stored in AWS s3. I use aiobotocore
to asynchronously upload the raw binary data to s3.
Here's the code to async upload a file to s3:
import aiohttp
import io
import aiobotocore
from engine.config import config
async def upload_object(
request: aiohttp.web.Request,
key: str,
bucket: str,
data: bytes,
public_read: bool = False,
):
"""
Helper function to upload a single file.
Args:
:object_name str: The path to where the object will be stored in s3, e.g. data/annoy/test.py
:file BufferedReader: Reader stream to the file that's going to be uploaded
:bucket_name str: Name of the s3 bucket.
"""
loop = request.app.loop
semaphore = request.app["s3_semaphore"]
async with semaphore:
try:
session = aiobotocore.get_session(loop=loop)
async with session.create_client(
"s3",
aws_access_key_id=config["aws"]["access_key_id"],
aws_secret_access_key=config["aws"]["access_key_secret"],
) as aclient:
await aclient.put_object(
Bucket=bucket,
Key=key,
Body=io.BytesIO(data),
)
if public_read:
await aclient.put_object_acl(
Bucket=bucket, Key=key, ACL="public-read"
)
except TypeError as e:
raise aiohttp.web.HTTPException(text="Failed to upload file")
Note that the data
argument is of type bytes
.
The API reads the HTTP multipart request, and creates a PIL.Image
object. After manipulating the image, the API calls await upload_file()
like this:
resp = await s3.upload_object(
request=request,
bucket=image_bucket,
key=image_key,
data=pil_image.tobytes(),
public_read=public_read,
)
where request
is the aiohttp.web.Request
sent to the handler. The other arguments should be self-explanatory.
However this didn't work
The first version of the API didn't upload the image data to s3 correctly. When downloading the image from s3, the file data would be corrupted. The problem, I found out, is that image.tobytes()
writes the raw bytes of the internal PIL image representation, not the JPEG binary data. I'm guessing that PIL tries to restore RAW data with all the principal components? Not sure, but anyway this behavior is documented:
In [154]: i = Image.open('/Users/halfdan/Desktop/food.jpg')
In [155]: i.tobytes?
Signature: i.tobytes(encoder_name='raw', *args)
Docstring:
Return image as a bytes object.
.. warning::
This method returns the raw image data from the internal
storage. For compressed image data (e.g. PNG, JPEG) use
:meth:`~.save`, with a BytesIO parameter for in-memory
data.
:param encoder_name: What encoder to use. The default is to
use the standard "raw" encoder.
:param args: Extra arguments to the encoder.
:rtype: A bytes object.
File: ~/.pyenv/versions/3.6.5/lib/python3.6/site-packages/PIL/Image.py
Type: method
Solution
The solution was to use image.save
and write the data into a memory buffer, like this:
# using pil_image.tobytes() doesn't work, so use pil_image.save instead
image_bytes = io.BytesIO()
pil_image.save(image_bytes, format="JPEG")
image_bytes = image_bytes.getvalue()
resp = await s3.upload_object(
request=request,
bucket=image_bucket,
key=image_key,
data=image_bytes,
public_read=public_read,
)
Note that
image_bytes.read()
return None
, because the data is already in memory. Hence you need to call
image_bytes.getvalue()
勉強になりました!