1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

AWS Lambda Node.js dockerイメージでmecab (+neologd)を実行する

Last updated at Posted at 2023-09-19

試験的にmecabが必要になったのですが、PHP8に対応していないので、EC2ではなく、Lambdaにインストールして、EC2のPHPから叩くようにしました。neologdができるようにしたのはおまけ。pythonがわからないので、nodeでイメージを作成しています。neologdの辞書まで含めた場合、数GB程度になりますが、LambdaのECRイメージデプロイは、10GBまで対応しているので、問題なくデプロイ、実行できました。Dockerなので、ローカル確認もできます。

ファイル構成

Dockerfile
app.js
docker-compose.yml
package.json
deploy.sh

Dockerfile
基本イメージは、public.ecr.aws/lambda/nodejs:18 2023年9月17日現在、20は出ていませんでした。
makeに必要なライブラリをインストールして、makeする形。
ipadic-neologdは、gitプロトコルでcloneすると鍵の問題解決が難しそうだったので、https:からのclone。
cdコマンドも、Dockerfileは対応していないので、繋げる形で実行しました。

sedで /usr/local/etc/mecabrc を書き換えで、デフォルト辞書を ipadic →mecab-ipadic-neologd に変更しています。

FROM public.ecr.aws/lambda/nodejs:18
COPY *.js* package*.json .env ./

RUN yum install -y gcc gcc-c++ git patch tar make which find xz file openssl
RUN rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm

RUN mkdir ./mecab-service
RUN curl -L "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE" -o mecab-0.996.tar.gz
RUN tar zxvf mecab-0.996.tar.gz
RUN cd ./mecab-0.996 && ./configure --enable-utf8-only
RUN cd ./mecab-0.996 && make
RUN cd ./mecab-0.996 && make install

RUN curl -L "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM" -o mecab-ipadic-2.7.0-20070801.tar.gz
RUN tar zxvf mecab-ipadic-2.7.0-20070801.tar.gz
RUN cd ./mecab-ipadic-2.7.0-20070801 && ./configure --with-charset=utf8
RUN cd ./mecab-ipadic-2.7.0-20070801 && make
RUN cd ./mecab-ipadic-2.7.0-20070801 && make install

RUN git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
RUN cd mecab-ipadic-neologd && ./bin/install-mecab-ipadic-neologd -n -a -y
RUN sed -i -e s/ipadic/mecab-ipadic-neologd/g /usr/local/etc/mecabrc 


RUN npm install
ENV MY_ENV_VAR=value

CMD [ "app.lambdaHandler" ]

app.js

ライブラリとしてmecab-clientを使用。他に良い物が見当たりませんでした。デファクトスタンダードぽいのがあればいいのですが。

引数を、mecabで、解析して返すだけの実行です。デフォルトの辞書をつかうので、上記のDockerファイルでは、sedで書き換えました。


import { MeCab } from 'mecab-client'
export const lambdaHandler = async (event, context) => {
  console.log('%j',event)
  const mecab = new MeCab()
  const result = await mecab.parse(event.p)

  return result
};

docker-compose.yml

version: '3'
services:
  mecab:
    image: mecab
    build: .
    ports:
      - 9099:8080
    tty: true
    volumes:
      - "./data/:/var/task/data"
    working_dir: /var/task/
    env_file:
      - .env

package.json
mecab-clientのみ。

{
       "type": "module",
    "dependencies": {
        "mecab-client": "0.0.1"
    }
}

ビルド 任意のディレクトリで。そこそこ時間がかかります。

docker-compose up --build
Building mecab
[+] Building 1027.5s (24/24) FINISHED
 => [internal] load build definition from Dockerfile                                  0.0s
 => => transferring dockerfile: 32B                                                   0.0s
 => [internal] load .dockerignore                                                     0.1s
 => => transferring context: 2B                                                       0.0s
 => [internal] load metadata for public.ecr.aws/lambda/nodejs:18                      0.0s
 => CACHED [ 1/19] FROM public.ecr.aws/lambda/nodejs:18                               0.0s
 => [internal] load build context                                                     0.0s
 => => transferring context: 355B                                                     0.0s
 => [ 2/19] COPY *.js* package*.json .env ./                                          0.1s
 => [ 3/19] RUN yum install -y gcc gcc-c++ git patch tar make which find xz file op  45.4s
 => [ 4/19] RUN rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.  1.1s
 => [ 5/19] RUN mkdir ./mecab-service                                                 0.5s
 => [ 6/19] RUN curl -L "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh  2.7s
 => [ 7/19] RUN tar zxvf mecab-0.996.tar.gz                                           0.6s
 => [ 8/19] RUN cd ./mecab-0.996 && ./configure --enable-utf8-only                    5.1s
 => [ 9/19] RUN cd ./mecab-0.996 && make                                             33.4s
 => [10/19] RUN cd ./mecab-0.996 && make install                                      0.7s
 => [11/19] RUN curl -L "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh  5.0s
 => [12/19] RUN tar zxvf mecab-ipadic-2.7.0-20070801.tar.gz                           1.0s
 => [13/19] RUN cd ./mecab-ipadic-2.7.0-20070801 && ./configure --with-charset=utf8   1.7s
 => [14/19] RUN cd ./mecab-ipadic-2.7.0-20070801 && make                              1.6s
 => [15/19] RUN cd ./mecab-ipadic-2.7.0-20070801 && make install                      0.8s
 => [16/19] RUN git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd  11.6s
 => [17/19] RUN cd mecab-ipadic-neologd && ./bin/install-mecab-ipadic-neologd -n -  903.0s
 => [18/19] RUN sed -i -e s/ipadic/mecab-ipadic-neologd/g /usr/local/etc/mecabrc      0.5s
 => [19/19] RUN npm install                                                           2.1s
 => exporting to image                                                               10.5s
 => => exporting layers                                                              10.4s
 => => writing image sha256:a0e9c6f22f1e49c3a363776b5f2d97c8f430265d25bf8b42ed660cdf  0.0s
 => => naming to docker.io/library/mecab                                              0.0s
Recreating mecab_mecab_1 ... done
Attaching to mecab_mecab_1
mecab_1  | 19 Sep 2023 02:30:26,880 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)

ローカル実行
https://engineering.linecorp.com/ja/blog/mecab-ipadic-neologd-new-words-and-expressions
新語・固有表現に強い「mecab-ipadic-NEologd」の効果を調べてみた (LINE Engineering)

恋ダンスが、固有名詞と判断されている。


curl -XPOST "http://localhost:9099/2015-03-31/functions/function/invocations" -d '{"p":"彼女はペンパイナッポーアッポーペンと恋ダンスを踊った。"}'
[
    {
        "surface": "彼女",
        "lexical": "名詞",
        "compound1": "代名詞",
        "compound2": "一般",
        "compound3": "*",
        "conjugation": "*",
        "inflection": "*",
        "original": "彼女",
        "reading": "カノジョ",
        "pronunciation": "カノジョ"
    },
    {
        "surface": "は",
        "lexical": "助詞",
        "compound1": "係助詞",
        "compound2": "*",
        "compound3": "*",
        "conjugation": "*",
        "inflection": "*",
        "original": "は",
        "reading": "ハ",
        "pronunciation": " ワ"
    },
    {
        "surface": "ペンパイナッポーアッポーペン",
        "lexical": "名詞",
        "compound1": "固有名詞",
        "compound2": "一般",
        "compound3": "*",
        "conjugation": "*",
        "inflection": "*",
        "original": "Pen-Pineapple-Apple-Pen",
        "reading": "ペンパイナッポーアッポーペン",
        "pronunciation": "ペンパイナッポーアッ ポーペン"
    },
    {
        "surface": "と",
        "lexical": "助詞",
        "compound1": "並立助詞",
        "compound2": "*",
        "compound3": "*",
        "conjugation": "*",
        "inflection": "*",
        "original": "と",
        "reading": "ト",
        "pronunciation": "ト"
    },
    {
        "surface": "恋ダンス",
        "lexical": "名詞",
        "compound1": "固有名詞",
        "compound2": "一般",
        "compound3": "*",
        "conjugation": "*",
        "inflection": "*",
        "original": "恋ダンス",
        "reading": "コイダンス",
        "pronunciation": "コイダンス"
    },
    {
        "surface": "を",
        "lexical": "助詞",
        "compound1": "格助詞",
        "compound2": "一般",
        "compound3": "*",
        "conjugation": "*",
        "inflection": "*",
        "original": "を",
        "reading": " ヲ",
        "pronunciation": "ヲ"
    },
    {
        "surface": "踊っ",
        "lexical": "動詞",
        "compound1": "自立",
        "compound2": "*",
        "compound3": "*",
        "conjugation": "五段・ラ行",
        "inflection": "連用タ接続",
        "original": "踊る",
        "reading": "オドッ",
        "pronunciation": "オドッ"
    },
    {
        "surface": "た",
        "lexical": "助動詞",
        "compound1": "*",
        "compound2": "*",
        "compound3": "*",
        "conjugation": "特殊・タ",
        "inflection": "基本形",
        "original": "た",
        "reading": "タ",
        "pronunciation": "タ"
    },
    {
        "surface": "。",
        "lexical": "記号",
        "compound1": "句点",
        "compound2": "*",
        "compound3": "*",
        "conjugation": "*",
        "inflection": "*",
        "original": "。",
        "reading": "。",
        "pronunciation": "。"
    }
]

ECRデプロイ
deploy.sh

docker build -t app-mecab .
docker tag app-mecab:latest 88*******.dkr.ecr.ap-northeast-1.amazonaws.com/app-mecab:latest
aws ecr get-login-password --region ap-northeast-1 --profile (任意のアクセスキー) | docker login --username AWS --password-stdin 88********.dkr.ecr.ap-northeast-1.amazonaws.com
aws ecr create-repository --repository-name app-mecab --profile (任意のアクセスキー)
docker push 88*******.dkr.ecr.ap-northeast-1.amazonaws.com/app-mecab:latest

Lambdaデプロイ
イメージデプロイにして、上記のイメージをECRから選択する。
image.png

image.png

Lambda テスト
image.png

結果
image.png
恋ダンスが、固有名詞と判断されている。

PHPから起動
歴史的経緯で、アクセスキーからの起動としていますが任意のcrendentialで大丈夫だと思います。


use Aws\Lambda\LambdaClient;



  protected function executeMecab($text)
    {
        $json = json_encode(["p"=>$text]);

        $lambdaFunctionName = {lambdaのファンクション名}
        if ($lambdaFunctionName === '') {
            return false;
        }
        $client = new LambdaClient([
            'region'  => 'ap-northeast-1',
            'version' => '2015-03-31',
            'credentials' => [
                'key' => env('AWS_ACCESS_KEY_ID'),
                'secret' => env('AWS_SECRET_ACCESS_KEY')
            ]
        ]);
        $invoke = $client->invoke(array(
            'FunctionName' => $lambdaFunctionName,
            'InvocationType' => 'RequestResponse', //同期実行
            'LogType' => 'Tail',
            'Payload' => $json
        ));
        $result = json_decode($invoke->get('Payload')->__toString());
        return $result;
    }

EC2にライブラリをインストールしたくない、手軽にmecabを試してみたいなどの用途に良いと思われます。

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?