背景
アドベントカレンダー用記事を書いていて、サイズが大きい自然言語処理ライブラリをLambdaで使う部分で技術的障壁が出てきている。そんな中、EFSにセットアップしたPythonライブラリをLambdaにimportする方法という記事を見つける。こちらの技術で要件が満たせそうなので試してみる。
関係する拙記事
背景で述べた技術的障壁を乗り越えるべく各種技術を検証した時の記事。
LambdaLayer用zipをCodeBuildでお手軽に作ってみる。
LambdaでDockerコンテナイメージ使えるってマジですか?(Python3でやってみる)
GiNZA とは
形態素解析を始めとして各種自然言語処理が出来るpythonライブラリ。spaCyの機能をラップしてる(はず)なのでその機能は使える。形態素解析エンジンにSudachiを使用したりもしている。
前提
リソース群は基本CloudFormationで作成。AWSコンソールからCloudFormationで、「スタックの作成」でCloudFormationのTemplateを読み込む形。すいませんが、CloudFormationの適用方法などは把握している方前提になります。
KeyPairの準備(無い場合)
後ほどのCloudFormationのパラメーター指定で必要になるので、AWSコンソールから作成しておく。もちろん、.sshフォルダへの配置など、sshログインの為の準備はしておく。(SSMでやれという話もあるが・・・)
VPCとかSubnetの準備(無い場合)
公式ページ AWS CloudFormation VPC テンプレート に記載のCloudFormationテンプレートを修正し、AWSコンソールから適用。
修正内容は以下の通り
- 料金節約の為にPrivateSubnetとかNATを削除
- 別のCloudFormationで使う値をExport
- 実際にはリソース名など変更しています
修正後のVPC+SubnetのCloudFormation
# It's based on the following sample.
# https://docs.aws.amazon.com/ja_jp/codebuild/latest/userguide/cloudformation-vpc-template.html
Description: This template deploys a VPC, with a pair of public and private subnets spread
across two Availability Zones. It deploys an internet gateway, with a default
route on the public subnets. It deploys a pair of NAT gateways (one in each AZ),
and default routes for them in the private subnets.
Parameters:
EnvironmentName:
Description: An environment name that is prefixed to resource names
Type: String
VpcCIDR:
Description: Please enter the IP range (CIDR notation) for this VPC
Type: String
Default: 10.192.0.0/16
PublicSubnet1CIDR:
Description: Please enter the IP range (CIDR notation) for the public subnet in the first Availability Zone
Type: String
Default: 10.192.10.0/24
PublicSubnet2CIDR:
Description: Please enter the IP range (CIDR notation) for the public subnet in the second Availability Zone
Type: String
Default: 10.192.11.0/24
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VpcCIDR
EnableDnsSupport: true
EnableDnsHostnames: true
Tags:
- Key: Name
Value: !Ref EnvironmentName
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Ref EnvironmentName
InternetGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
InternetGatewayId: !Ref InternetGateway
VpcId: !Ref VPC
PublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [ 0, !GetAZs '' ]
CidrBlock: !Ref PublicSubnet1CIDR
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub ${EnvironmentName} Public Subnet (AZ1)
PublicSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [ 1, !GetAZs '' ]
CidrBlock: !Ref PublicSubnet2CIDR
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub ${EnvironmentName} Public Subnet (AZ2)
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub ${EnvironmentName} Public Routes
DefaultPublicRoute:
Type: AWS::EC2::Route
DependsOn: InternetGatewayAttachment
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway
PublicSubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PublicRouteTable
SubnetId: !Ref PublicSubnet1
PublicSubnet2RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PublicRouteTable
SubnetId: !Ref PublicSubnet2
NoIngressSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: "no-ingress-sg"
GroupDescription: "Security group with no ingress rule"
VpcId: !Ref VPC
Outputs:
VPC:
Description: A reference to the created VPC
Value: !Ref VPC
Export:
Name: "VPC"
PublicSubnets:
Description: A list of the public subnets
Value: !Join [ ",", [ !Ref PublicSubnet1, !Ref PublicSubnet2 ]]
Export:
Name: "PublicSubnets"
PublicSubnet1:
Description: A reference to the public subnet in the 1st Availability Zone
Value: !Ref PublicSubnet1
Export:
Name: "PublicSubnet1"
PublicSubnet2:
Description: A reference to the public subnet in the 2nd Availability Zone
Value: !Ref PublicSubnet2
Export:
Name: "PublicSubnet2"
NoIngressSecurityGroup:
Description: Security group with no ingress rule
Value: !Ref NoIngressSecurityGroup
EFS+EC2(AutoScaling)の準備
公式ページ Amazon Elastic File System サンプルテンプレート に記載のCloudFormationを修正し、AWSコンソールから適用。VPCなどを既存の物を使う場合、適宜修正お願いします。
修正内容は以下の通り
- インスタンスタイプなど要らない部分削除
- AMIのImageIDは直接指定する形に(ami-00f045aed21a55240:Amazon Linux 2 AMI 2.0.20201126.0 x86_64 HVM gp2を使用)
- MountTargetを2つ(AZ分)に変更
- 別のCloudFormationで使うMountTargetなどをExportして参照可能に
- AccessPointのpathなど修正
- 実際にはリソース名など変更しています
修正後のEFS+EC2(AutoScaling)CloudFormation
# https://docs.aws.amazon.com/ja_jp/AWSCloudFormation/latest/UserGuide/quickref-efs.html
AWSTemplateFormatVersion: '2010-09-09'
Description: This template creates an Amazon EFS file system and mount target and
associates it with Amazon EC2 instances in an Auto Scaling group. **WARNING** This
template creates Amazon EC2 instances and related resources. You will be billed
for the AWS resources used if you create a stack from this template.
Parameters:
InstanceType:
Description: WebServer EC2 instance type
Type: String
Default: t3.small
AllowedValues:
- t3.nano
- t3.micro
- t3.small
- t3.medium
- t3.large
ConstraintDescription: must be a valid EC2 instance type.
AMIImageId:
Type: String
# Amazon Linux 2 AMI (HVM), SSD Volume Type
Default: ami-00f045aed21a55240
KeyName:
Type: AWS::EC2::KeyPair::KeyName
Description: Name of an existing EC2 key pair to enable SSH access to the ECS
instances
AsgMaxSize:
Type: Number
Description: Maximum size and initial desired capacity of Auto Scaling Group
Default: '1'
SSHLocation:
Description: The IP address range that can be used to connect to the EC2 instances
by using SSH
Type: String
MinLength: '9'
MaxLength: '18'
Default: 221.249.116.206/32
AllowedPattern: "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})/(\\d{1,2})"
ConstraintDescription: must be a valid IP CIDR range of the form x.x.x.x/x.
VolumeName:
Description: The name to be used for the EFS volume
Type: String
MinLength: '1'
Default: efsvolume
MountPoint:
Description: The Linux mount point for the EFS volume
Type: String
MinLength: '1'
Default: efsmountpoint
Mappings:
AWSInstanceType2Arch:
t3.nano:
Arch: HVM64
t3.micro:
Arch: HVM64
t3.small:
Arch: HVM64
t3.medium:
Arch: HVM64
t3.large:
Arch: HVM64
AWSRegionArch2AMI:
ap-northeast-1:
HVM64: ami-00f045aed21a55240
Resources:
CloudWatchPutMetricsRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
CloudWatchPutMetricsRolePolicy:
Type: AWS::IAM::Policy
Properties:
PolicyName: CloudWatch_PutMetricData
PolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: CloudWatchPutMetricData
Effect: Allow
Action:
- cloudwatch:PutMetricData
Resource:
- "*"
Roles:
- Ref: CloudWatchPutMetricsRole
CloudWatchPutMetricsInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: "/"
Roles:
- Ref: CloudWatchPutMetricsRole
InstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
VpcId:
Fn::ImportValue: VPC
GroupDescription: Enable SSH access via port 22
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '22'
ToPort: '22'
CidrIp:
Ref: SSHLocation
MountTargetSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
VpcId:
Fn::ImportValue: VPC
GroupDescription: Security group for mount target
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '2049'
ToPort: '2049'
CidrIp: 0.0.0.0/0
FileSystem:
Type: AWS::EFS::FileSystem
Properties:
PerformanceMode: generalPurpose
FileSystemTags:
- Key: Name
Value:
Ref: VolumeName
MountTarget1:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId:
Ref: FileSystem
SubnetId:
Fn::ImportValue: PublicSubnet1
SecurityGroups:
- Ref: InstanceSecurityGroup
- Ref: MountTargetSecurityGroup
MountTarget2:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId:
Ref: FileSystem
SubnetId:
Fn::ImportValue: PublicSubnet2
SecurityGroups:
- Ref: InstanceSecurityGroup
- Ref: MountTargetSecurityGroup
EFSAccessPoint:
Type: 'AWS::EFS::AccessPoint'
Properties:
FileSystemId: !Ref FileSystem
RootDirectory:
Path: "/"
LaunchConfiguration:
Type: AWS::AutoScaling::LaunchConfiguration
Metadata:
AWS::CloudFormation::Init:
configSets:
MountConfig:
- setup
- mount
setup:
packages:
yum:
nfs-utils: []
files:
"/home/ec2-user/post_nfsstat":
content: !Sub |
#!/bin/bash
INPUT="$(cat)"
CW_JSON_OPEN='{ "Namespace": "EFS", "MetricData": [ '
CW_JSON_CLOSE=' ] }'
CW_JSON_METRIC=''
METRIC_COUNTER=0
for COL in 1 2 3 4 5 6; do
COUNTER=0
METRIC_FIELD=$COL
DATA_FIELD=$(($COL+($COL-1)))
while read line; do
if [[ COUNTER -gt 0 ]]; then
LINE=`echo $line | tr -s ' ' `
AWS_COMMAND="aws cloudwatch put-metric-data --region ${AWS::Region}"
MOD=$(( $COUNTER % 2))
if [ $MOD -eq 1 ]; then
METRIC_NAME=`echo $LINE | cut -d ' ' -f $METRIC_FIELD`
else
METRIC_VALUE=`echo $LINE | cut -d ' ' -f $DATA_FIELD`
fi
if [[ -n "$METRIC_NAME" && -n "$METRIC_VALUE" ]]; then
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
CW_JSON_METRIC="$CW_JSON_METRIC { \"MetricName\": \"$METRIC_NAME\", \"Dimensions\": [{\"Name\": \"InstanceId\", \"Value\": \"$INSTANCE_ID\"} ], \"Value\": $METRIC_VALUE },"
unset METRIC_NAME
unset METRIC_VALUE
METRIC_COUNTER=$((METRIC_COUNTER+1))
if [ $METRIC_COUNTER -eq 20 ]; then
# 20 is max metric collection size, so we have to submit here
aws cloudwatch put-metric-data --region ${AWS::Region} --cli-input-json "`echo $CW_JSON_OPEN ${!CW_JSON_METRIC%?} $CW_JSON_CLOSE`"
# reset
METRIC_COUNTER=0
CW_JSON_METRIC=''
fi
fi
COUNTER=$((COUNTER+1))
fi
if [[ "$line" == "Client nfs v4:" ]]; then
# the next line is the good stuff
COUNTER=$((COUNTER+1))
fi
done <<< "$INPUT"
done
# submit whatever is left
aws cloudwatch put-metric-data --region ${AWS::Region} --cli-input-json "`echo $CW_JSON_OPEN ${!CW_JSON_METRIC%?} $CW_JSON_CLOSE`"
mode: '000755'
owner: ec2-user
group: ec2-user
"/home/ec2-user/crontab":
content: "* * * * * /usr/sbin/nfsstat | /home/ec2-user/post_nfsstat\n"
owner: ec2-user
group: ec2-user
commands:
01_createdir:
command: !Sub "mkdir /${MountPoint}"
mount:
commands:
01_mount:
command: !Sub >
mount -t nfs4 -o nfsvers=4.1 ${FileSystem}.efs.${AWS::Region}.amazonaws.com:/ /${MountPoint}
02_permissions:
command: !Sub "chown ec2-user:ec2-user /${MountPoint}"
Properties:
AssociatePublicIpAddress: true
ImageId:
Ref: AMIImageId
InstanceType:
Ref: InstanceType
KeyName:
Ref: KeyName
SecurityGroups:
- Ref: InstanceSecurityGroup
IamInstanceProfile:
Ref: CloudWatchPutMetricsInstanceProfile
UserData:
Fn::Base64: !Sub |
#!/bin/bash -xe
yum install -y aws-cfn-bootstrap
/opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource LaunchConfiguration --configsets MountConfig --region ${AWS::Region}
crontab /home/ec2-user/crontab
/opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource AutoScalingGroup --region ${AWS::Region}
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
DependsOn:
- MountTarget1
- MountTarget2
CreationPolicy:
ResourceSignal:
Timeout: PT15M
Count:
Ref: AsgMaxSize
Properties:
VPCZoneIdentifier:
- Fn::ImportValue: PublicSubnet1
- Fn::ImportValue: PublicSubnet2
LaunchConfigurationName:
Ref: LaunchConfiguration
MinSize: '1'
MaxSize:
Ref: AsgMaxSize
DesiredCapacity:
Ref: AsgMaxSize
Tags:
- Key: Name
Value: EFS FileSystem Mounted Instance
PropagateAtLaunch: 'true'
Outputs:
MountTargetID1:
Description: Mount target ID
Value:
Ref: MountTarget1
MountTargetID2:
Description: Mount target ID
Value:
Ref: MountTarget2
LambdaEFSArn:
Description: File system Arn
Value: !GetAtt FileSystem.Arn
Export:
Name: !Sub "LambdaEFSArn"
LambdaEFSAccessPointArn:
Description: File system AccessPointArn
Value: !GetAtt EFSAccessPoint.Arn
Export:
Name: !Sub "LambdaEFSAccessPointArn"
InstanceSecurityGroup:
Description: A reference to the InstanceSecurityGroup
Value: !Ref InstanceSecurityGroup
Export:
Name: "InstanceSecurityGroup"
MountTargetSecurityGroup:
Description: A reference to the MountTargetSecurityGroup
Value: !Ref MountTargetSecurityGroup
Export:
Name: "MountTargetSecurityGroup"
EC2へログインしてモジュールインストール
EFSにセットアップしたPythonライブラリをLambdaにimportする方法をトレースさせて頂く。
ログイン
AWSコンソールからPublicIPを調べてssh。
ssh -i ~/.ssh/hogehoge-keypair.pem ec2-user@xx.yyy.xxx.zzz
マウント確認
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 469M 0 469M 0% /dev
tmpfs 479M 0 479M 0% /dev/shm
tmpfs 479M 388K 479M 1% /run
tmpfs 479M 0 479M 0% /sys/fs/cgroup
/dev/nvme0n1p1 8.0G 1.6G 6.5G 20% /
xx-yyyyyyyz.efs.ap-northeast-1.amazonaws.com:/ 8.0E 0 8.0E 0% /efsmountpoint
tmpfs 96M 0 96M 0% /run/user/1000
/efsmountpoint にEFSがマウントされているのを確認。
Pythonなどのモジュールインストール
su にならないとginzaが上手くインストールできなかったのでその部分修正
sudo su -
cd /efsmountpoint
yum update
yum -y install gcc openssl-devel bzip2-devel libffi-devel
wget https://www.python.org/ftp/python/3.8.6/Python-3.8.6.tgz
tar xzf Python-3.8.6.tgz
cd Python-3.8.6
./configure --enable-optimizations
make altinstall
# check
python3.8 --version
pip3.8 --version
GiNZAインストール
pip3.8 install --upgrade --target lambda/ ginza==4.0.5
# 念のためフル権限にしておく
chmod 777 -R lambda/
※ここまででEC2は必要無くなります。AWSコンソールからEC2 => AutoScalingグループ => 対象のAutoScalingグループ選択 => グループの詳細 の「編集」で 「希望する容量」「最小キャパシティ」「最大キャパシティ」を全て0にしてインスタンスを終了。でないと不必要なお金がかかってしまうので注意!!!!
テスト用Lambdaを登録(メイン部分)
こちらのCloudFormationをAWSコンソールから適用。重要なのはインラインで記載されてるソースの以下部分。あと、FileSystemConfigs プロパティの設定。EFSを使うので、VPCに属するLambdaにしています。
sys.path.append("/mnt/efs0/lambda")
FileSystemConfigs:
- Arn:
Fn::ImportValue: LambdaEFSAccessPointArn
LocalMountPath: "/mnt/efs0"
テスト用LambdaのCloudFormation(Policy+Lambda)
AWSTemplateFormatVersion: '2010-09-09'
Description: Lambda test with EFS
Resources:
LambdaRole:
Type: AWS::IAM::Role
Properties:
RoleName: "LambdaRole"
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
Policies:
- PolicyName: "LambdaPolicy"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "*"
- Effect: Allow
Action:
- cloudwatch:GetMetricStatistics
Resource: "*"
- Effect: Allow
Action:
- dynamodb:GetRecords
- dynamodb:GetItem
- dynamodb:BatchGetItem
- dynamodb:BatchWriteItem
- dynamodb:DeleteItem
- dynamodb:Query
- dynamodb:Scan
- dynamodb:PutItem
- dynamodb:UpdateItem
Resource: "*"
- Effect: Allow
Action:
- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
- ec2:DescribeSecurityGroups
- ec2:DescribeSubnets
- ec2:DescribeVpcs
Resource: "*"
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "*"
- Effect: Allow
Action:
- elasticfilesystem:ClientMount
- elasticfilesystem:ClientWrite
- elasticfilesystem:DescribeMountTargets
Resource: "*"
LambdaEFSTest:
Type: AWS::Lambda::Function
Properties:
FunctionName: efstestlambda
Handler: index.handler
Runtime: python3.8
Code:
ZipFile: |
import sys
sys.path.append("/mnt/efs0/lambda")
import json
import spacy
import logging
from ginza import *
logger = logging.getLogger()
def handler(event, context):
logger.info(context)
target_text = event['text']
nlp = spacy.load('ja_ginza')
doc = nlp(target_text)
morpheme_list = []
for sent_idx, sent in enumerate(doc.sents):
for token_idx, tk in enumerate(sent):
wk_morpheme = {}
wk_morpheme['text'] = tk.text
wk_morpheme['dep'] = tk.dep_
wk_morpheme['pos'] = tk.pos_
wk_morpheme['tag'] = tk.tag_
morpheme_list.append(wk_morpheme)
return morpheme_list
FileSystemConfigs:
- Arn:
Fn::ImportValue: LambdaEFSAccessPointArn
LocalMountPath: "/mnt/efs0"
Description: Lambda test with EFS.
MemorySize: 2048
Timeout: 15
Role: !GetAtt LambdaRole.Arn
VpcConfig:
SecurityGroupIds:
- Fn::ImportValue: InstanceSecurityGroup
- Fn::ImportValue: MountTargetSecurityGroup
SubnetIds:
- Fn::ImportValue: PublicSubnet1
- Fn::ImportValue: PublicSubnet2
テストする
- 「テスト」ボタンを押す
- イベント名は適当に
-
{"text":"テストしてみる"}
をテスト用Bodyに指定 - 「作成」を押す
- 元の画面に戻る。テストが作成されているのでその状態で「テスト」ボタンを押す。
成功!(2回目以降の実行なので622msになってます。1回目は4秒以上かかりました)
終わりに
いくつかの検討を経て、ようやくサーバーレスで自然言語処理が出来そうです(EFSはストレージなので許容します)。
LambdaコンテナもEFSとのマウントも今年の機能っぽいです。去年検討していたら諦めていた事になります。AWSの機能追加速度には目を見張るものがあります。すなわち日々キャッチアップが必要という事になる訳で。大変ですw