bitbucket
BitbucketPipelines

How to use the cache on Bitbucket Pipelines

Original Japanese article is here: https://qiita.com/keisuke-nakata/items/cb3d436519dab2868c39

TL;DR

  • The only thing the cache does is "extracting stored directories at the beginning of pipelines".
  • So if you want to utilize the cache, you must write a script which does "use cache if exists, otherwise download it".

What I wanted to do

Bitbucket Pipelines has "cache" feature.
What I wanted to do is "speed up building library with cache, which requires make && make install", but the official document, in particular, about custom cache is hard for me and I struggled with it.

Use Docker image if possible

You can use any Docker image in Bitbucket Pipelines as build base image.
Therefore, use docker build caching feature if you can either:

  • make the development environment's docker image public (e.g. Dockerhub)
  • pull images from a private registry (e.g. your own docker registry)

The rest of this article is for people who are not permitted publishing docker images or want to cache workflow which is not suitable for docker.

How to use Pipelines cache

In this article, I tried to cache the build of MeCab+IPAdic. 1

My final repository is like:

your_repo/
  bitbucket-pipelines.yml
  bitbucket-pipelines-mecab-download.sh
  bitbucket-pipelines-ipadic-download.sh

I will describe each file one by one.

bitbucket-pipelines.yml

image: python:3.6.4

pipelines:
  default:
    - step:
        caches:
          - pip
          - mecab
          - mecab-ipadic
        script:
          - bash bitbucket-pipelines-mecab-download.sh
          - (cd ~/mecab/mecab/mecab-0.996 && make install && ldconfig)
          - bash bitbucket-pipelines-ipadic-download.sh
          - (cd ~/mecab/mecab-ipadic/mecab-ipadic-2.7.0-20070801 && make install)
          - pip install -r requirements.txt
          - python setup.py test  # your test here

definitions:
  caches:
    mecab: ~/mecab/mecab
    mecab-ipadic: ~/mecab/mecab-ipadic

bash bitbucket-pipelines-mecab-download.sh in the script section does
"check cache of already-built mecab. If not exists, download and make it."
The succeeding line invokes make install mecab.

bash bitbucket-pipelines-ipadic-download.sh is almost same.

After running above make installs, I run the test (python setup.py test) which is the actual target of this pipeline.

bitbucket-pipelines-mecab-download.sh

#!/bin/bash
CACHE_DIR=${HOME}/mecab/mecab
mkdir -p ${CACHE_DIR}
if [ -d "${CACHE_DIR}/mecab-0.996" ]; then
  echo "found mecab cache"
else
  wget "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7cENtOXlicTFaRUE" -O mecab-0.996.tar.gz
  tar zxfv mecab-0.996.tar.gz -C ${CACHE_DIR}/
  cd ${CACHE_DIR}/mecab-0.996
  ./configure --with-charset=utf8
  make
fi

This script, as previously described, does
"check cache of already-built mecab (at the first if statement). If not exists, download and make it."

./configure and make is also cached because they do not matter until I change the base image.
(the remaining make install is a light operation because they almost always just copy some files.)2

The "cache" is, at the beginning of the Pipelines, extracted from the directories stored when Pipelines finishes successfully at once.
In this case, step: caches: and definitions: caches: sections in bitbucket-pipelines.yml specify the cache directory.
(The definition of pip is missing because commonly used package managers' cache definition is pre-defined by Pipelines.)

bitbucket-pipelines-ipadic-download.sh

#!/bin/bash
CACHE_DIR=${HOME}/mecab/mecab-ipadic
mkdir -p ${CACHE_DIR}
if [ -d "${CACHE_DIR}/mecab-ipadic-2.7.0-20070801" ]; then
  echo "found ipadic cache"
else
  wget "https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7MWVlSDBCSXZMTXM" -O mecab-ipadic-2.7.0-20070801.tar.gz
  tar zxfv mecab-ipadic-2.7.0-20070801.tar.gz -C ${CACHE_DIR}/
  cd ${CACHE_DIR}/mecab-ipadic-2.7.0-20070801
  ./configure --with-charset=utf8
  make
fi

Do the same thing for IPAdic.

Speed up

image.png

In my case the final trial reduces the building time 1/3.
(pip install required 26 sec, which is almost half of the building time (51 sec). Caching download does not matter when the installation operation is heavy.
As described previously, creating docker image (if possible) is a better solution, in my opinion.)

References


  1. MeCab is a Japanese language tokenizer. IPAdic is its main dictionary. 

  2. Because whole cache directories will be restored and the last make result will be there, running make every time may not affect the building time thanks to the Makefile's redundancy detection feature.