0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

mecabで辞書に固有名詞を登録する

Posted at

上記案件に付随して、固有名詞を追加したいという話しがあったので。
固有名詞だけだと、追加して辞書を再コンパイルする。

直下に置く
user_dic.csv

桜富士,,,1,名詞,固有名詞,人名,一般,*,*,サクラフジ,タケルフジ,サクラフジ,タケルフジ,サクラフジ,タケルフジ,固,*,*,*,*
紫こうじ,,,1,名詞,固有名詞,一般,*,*,*,ムラサキコウジ,ムラサキコウジ,紫こうじ,ムラサキコウジ,紫こうじ,ムラサキコウジ,固,*,*,*,*
アルカノイド,,,1,名詞,固有名詞,一般,*,*,*,アルカノイド,アルカノイド,アルカノイド,アルカノイド,アルカノイド,アルカノイド,固,*,*,*,*

Dockerfile
#user dictionary 以下で、再コンパイルしている。

FROM public.ecr.aws/lambda/nodejs:18
COPY *.js* package*.json .env ./


RUN yum install -y gcc gcc-c++ git patch tar make which find xz file openssl unzip sudo
RUN rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm

RUN mkdir ./mecab-service
RUN curl -L "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE" -o mecab-0.996.tar.gz
RUN tar zxvf mecab-0.996.tar.gz
RUN cd ./mecab-0.996 && ./configure --enable-utf8-only
RUN cd ./mecab-0.996 && make
RUN cd ./mecab-0.996 && make install

RUN curl -L "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM" -o mecab-ipadic-2.7.0-20070801.tar.gz
RUN tar zxvf mecab-ipadic-2.7.0-20070801.tar.gz
RUN cd ./mecab-ipadic-2.7.0-20070801 && ./configure --with-charset=utf8

RUN cd ./mecab-ipadic-2.7.0-20070801 && /usr/local/libexec/mecab/mecab-dict-index -f utf-8 -t utf-8
RUN cd ./mecab-ipadic-2.7.0-20070801 && make
RUN cd ./mecab-ipadic-2.7.0-20070801 && make install
RUN git clone --depth 1 https://github.com/neologd/mecab-unidic-neologd.git
#if [ ! -e ${BASEDIR}/../build/${ORG_DIC_NAME}.zip ]; then
#    curl --insecure -L "http://osdn.jp/frs/redir.php?m=jaist&f=%2Funidic%2F58338%2F${ORG_DIC_NAME}.zip" -o "${ORG_DIC_NAME}.zip"
COPY unidic-mecab-2.1.2_src.zip mecab-unidic-neologd/build/
RUN cd mecab-unidic-neologd && ./bin/install-mecab-unidic-neologd -n -y
RUN sed -i -e s/ipadic/mecab-unidic-neologd/g /usr/local/etc/mecabrc 

#user dictionary
COPY user_dic.csv mecab-unidic-neologd/build/unidic-mecab-2.1.2_src-neologd-20200910
RUN cd mecab-unidic-neologd/build/unidic-mecab-2.1.2_src-neologd-20200910 && /usr/local/libexec/mecab/mecab-dict-index -f UTF8 -t UTF8
RUN cd mecab-unidic-neologd/build/unidic-mecab-2.1.2_src-neologd-20200910 && /usr/bin/install -c -m 644 'dicrc' '/usr/local/lib/mecab/dic/mecab-unidic-neologd/dicrc'
RUN cd mecab-unidic-neologd/build/unidic-mecab-2.1.2_src-neologd-20200910 && /usr/bin/install -c -m 644 'char.bin' '/usr/local/lib/mecab/dic/mecab-unidic-neologd/char.bin'
RUN cd mecab-unidic-neologd/build/unidic-mecab-2.1.2_src-neologd-20200910 && /usr/bin/install -c -m 644 'unk.dic' '/usr/local/lib/mecab/dic/mecab-unidic-neologd/unk.dic'
RUN cd mecab-unidic-neologd/build/unidic-mecab-2.1.2_src-neologd-20200910 && /usr/bin/install -c -m 644 'sys.dic' '/usr/local/lib/mecab/dic/mecab-unidic-neologd/sys.dic'
RUN cd mecab-unidic-neologd/build/unidic-mecab-2.1.2_src-neologd-20200910 && /usr/bin/install -c -m 644 'matrix.bin' '/usr/local/lib/mecab/dic/mecab-unidic-neologd/matrix.bin'



RUN npm install
ENV MY_ENV_VAR=value

CMD [ "app.lambdaHandler" ]

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?