13
6

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

お題は不問!Qiita Engineer Festa 2023で記事投稿!

Lambda関数でLibreOfficeを使ってExcelファイルからpdfファイルを出力する

Last updated at Posted at 2023-06-14

作るもの

S3からエクセルファイルを取得し、
その内容をPDFに変換し、
そのPDFをS3に保存するLambda関数
PDF操作にはLibreOfficeを使用

前提

  • aws samを使って構築していく
  • CLIを使えるようにしておく
  • init samでhello worldのテンプレートが実行できる状態にしておく

開発

template

template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  convert_excel_into_pdf

Globals:
  Function:
    Timeout: 900 # ここまで増やす必要ないと思うが念の為
    MemorySize: 256 # LibreOfficeは結構メモリ使うので上げとく。128じゃ足りないかも。

    Tracing: Active
  Api:
    TracingEnabled: true

Parameters:
  BucketName:
    Type: String
    Default: 'my-bucket'

  ConvertExcelIntoPdfFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: convert_excel_into_pdf/
      Architectures:
        - x86_64
      Environment:
        Variables:
          BUCKET_NAME: !Ref BucketName
      # コンテナイメージをDockerで手動作成したいのでパッケージタイプはImage
      PackageType: Image
      Policies:
        - arn:aws:iam::aws:policy/AmazonS3FullAccess
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./convert_excel_into_pdf
      DockerTag: python3.8-v1

DockerFile

Dockerfile
FROM public.ecr.aws/lambda/python:3.8

RUN yum -y install curl wget tar gzip zlib freetype-devel
RUN yum -y install libxslt \
    gcc \
    ghostscript \
    lcms2-devel \
    libffi-devel \
    libjpeg-devel \
    libtiff-devel \
    libwebp-devel \
    make \
    openjpeg2-devel \
    sudo \
    tcl-devel \
    tk-devel \
    tkinter \
    which \
    xorg-x11-server-Xvfb \
    zlib-devel \
    java \
    ipa-gothic-fonts ipa-mincho-fonts ipa-pgothic-fonts ipa-pmincho-fonts \
    && yum clean all

# LibreOfficeツールをダウンロード
RUN wget http://download.documentfoundation.org/libreoffice/stable/7.5.3/rpm/x86_64/LibreOffice_7.5.3_Linux_x86-64_rpm.tar.gz
RUN tar -xvzf LibreOffice_7.5.3_Linux_x86-64_rpm.tar.gz
RUN cd LibreOffice_7.5.3.2_Linux_x86-64_rpm/RPMS; yum -y localinstall *.rpm;
RUN yum -y install cairo

COPY app.py ${LAMBDA_TASK_ROOT}

CMD [ "app.lambda_handler" ]

Lambda関数

app.py
import os
import subprocess

import boto3

s3 = boto3.resource("s3")
output_bucket = os.getenv("BUCKET_NAME")


def lambda_handler(event, context):
    input_bucket = os.getenv("BUCKET_NAME")
    input_key = "output.xlsx"
    in_bucket = s3.Bucket(input_bucket)
    # エクセルファイルを取得
    file_path = "/tmp/" + input_key
    in_bucket.download_file(input_key, file_path)
    # LibreOfficeを使ってPDF変換
    proc = subprocess.run(
        "/opt/libreoffice7.5/program/soffice --headless --norestore --invisible --nodefault --nofirststartwizard --nolockcheck --nologo --convert-to pdf:writer_pdf_Export --outdir /tmp {}".format(
            file_path
        ),
        shell=True,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )
    print("STDOUT: {}".format(proc.stdout))
    print("STDERR: {}".format(proc.stderr))

    key_list = input_key.split(".")
    pdf_path = file_path.replace(key_list[-1], "pdf")

    # PDFファイルをアップロード
    if not os.path.exists(pdf_path):
        print("The PDF file({}) cannot be found".format(pdf_path))
        return
    print("PDF: {}".format(pdf_path.replace("/tmp/", "")))
    print("Size: {}".format(os.path.getsize(pdf_path)))
    data = open(pdf_path, "rb")
    out_bucket = s3.Bucket(output_bucket)
    out_bucket.put_object(Key=pdf_path.replace("/tmp/", ""), Body=data)
    data.close()

    return {"statusCode": 200, "message": "PDF変換完了"}

参考

13
6
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
13
6

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?