Help us understand the problem. What is going on with this article?

CWLのワークフローを、ステップバイステップで書く。

概要

これに載っている、

この部分を実際に作ってみる

作業

元となるのは、次のものとなる

$ cat inputs.txt | head -n5 | sort -nr > output.txt # パイプを使った素敵な処理
  • ツールが3種類( cat , head , sort )ある
  • それぞれパイプで繋いでいる。
  • 入力は、入力ファイルと、headする行数

入力ファイルを用意する

まずは、入力となるファイルを用意する。
内容はなんでもよいので、以下のようにした。

inputs.txt
1st line.
4 sort
This is 1st line
A line
For sort

cat 単独でうごくものを用意する

cat 部分については、
雑に始める CWL! - Qiita
を元に、作成した

cat.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [cat]
arguments: [$(inputs.input_file)]
inputs:
  - id: input_file
    type: File
outputs:
  - id: out
    type: stdout

validate

実行のテスト

head

head については、
雑に始める CWL! - Qiita
にあるので、

head.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [head]
arguments: [-n$(inputs.nlines), $(inputs.source)]
inputs:
  - id: source
    type: File
  - id: nlines
    type: int
outputs:
  - id: out
    type: stdout

validate

実行のテスト

sort

sort部分については、

雑に始める CWL! - Qiita
を元に、作成した

sort.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [sort,-nr]
arguments: [$(inputs.input_file)]
inputs:
  - id: input_file
    type: File
outputs:
  - id: out
    type: stdout
stdout: output.txt

validate

実行のテスト

パイプを使えるようにする。

パイプを使ってうまいこと実行してほしい

ここにあるように、 streamable: true を加える

cat

cat.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [cat]
arguments: [$(inputs.input_file)]
inputs:
  - id: input_file
    type: File
    streamable: true
outputs:
  - id: out
    type: stdout

head

head.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [head]
arguments: [-n$(inputs.nlines), $(inputs.source)]
inputs:
  - id: source
    type: File
    streamable: true
  - id: nlines
    type: int
outputs:
  - id: out
    type: stdout

sort

sort.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [sort,-nr]
arguments: [$(inputs.input_file)]
inputs:
  - id: input_file
    type: File
    streamable: true
outputs:
  - id: out
    type: stdout
stdout: output.txt

ワークフロー部分を作る

catheadsort.cwl という名前とする

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs: []
steps: []
outputs: []

steps の id を全部書く

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs: []
steps:
  - id: cat
  - id: head
  - id: sort
outputs: []

各 step の run に実行すべき CommandLineTool の CWL ファイルを指定する

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs: []
steps:
  - id: cat
    run: cat.cwl
  - id: head
    run: head.cwl
  - id: sort
    run: sort.cwl
outputs: []

run に指定した CommandLineTool 定義を参照して、各CWLの outputs を全部 step の out に書く

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs: []
steps:
  - id: cat
    run: cat.cwl
    out: [out]
  - id: head
    run: head.cwl
    out: [out]
  - id: sort
    run: sort.cwl
    out: [out]
outputs: []

out に指定したもののうち、別のツールの in になるものを記述する (ツール間の依存関係の解決)

cat のoutは head の in

となるので、次のようになる。

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs: []
steps:
  - id: cat
    run: cat.cwl
    out: [out]
  - id: head
    run: head.cwl
    in:
      - id: source
        source: cat/out    
    out: [out]
  - id: sort
    run: sort.cwl
    out: [out]
outputs: []

head のoutは、 sort の in

となるので、次のようになる。

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs: []
steps:
  - id: cat
    run: cat.cwl
    out: [out]
  - id: head
    run: head.cwl
    in:
      - id: source
        source: cat/out    
    out: [out]
  - id: sort
    run: sort.cwl
    in:
      - id: input_file
        source: head/out    
    out: [out]
outputs: []

実行時に入力されるパラメータをinputsに記述する

cat の入力は、実行時に渡されるので、
inputs に書く

- id: input_file
  type: File
catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs:
  - id: input_file
    type: File
steps:
  - id: cat
    run: cat.cwl
    out: [out]
  - id: head
    run: head.cwl
    in:
      - id: source
        source: cat/out    
    out: [out]
  - id: sort
    run: sort.cwl
    out: [out]
outputs: []

headする行数も実行時に指定するので
inputs に書く

- id: nlines
  type: int
catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs:
  - id: input_file
    type: File
  - id: nlines
    type: int
steps:
  - id: cat
    run: cat.cwl
    out: [out]
  - id: head
    run: head.cwl
    in:
      - id: source
        source: cat/out
    out: [out]
  - id: sort
    run: sort.cwl
    in:
      - id: input_file
        source: head/out    
    out: [out]
outputs: []

inputsに書いたものを、使用するstepのinに書く

cat で入力するファイルを catin に書く

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs:
  - id: input_file
    type: File
  - id: nlines
    type: int
steps:
  - id: cat
    run: cat.cwl
    in:
      - id: input_file
        source: input_file
    out: [out]
  - id: head
    run: head.cwl
    in:
      - id: source
        source: cat/out
    out: [out]
  - id: sort
    run: sort.cwl
    in:
      - id: input_file
        source: head/out    
    out: [out]
outputs: []

headする行数を headin に書く

catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs:
  - id: input_file
    type: File
  - id: nlines
    type: int
steps:
  - id: cat
    run: cat.cwl
    in:
      - id: input_file
        source: input_file
    out: [out]
  - id: head
    run: head.cwl
    in:
      - id: source
        source: cat/out
      - id: nlines
        source: nlines
    out: [out]
  - id: sort
    run: sort.cwl
    in:
      - id: input_file
        source: head/out    
    out: [out]
outputs: []

out に指定したもののうち、workflow終了時に手元に残っておいてほしいものを outputs に書く

今回は sort の最後は、 out を手元に残しておきたいので、
outputs を次のように書く

outputs:
  - id: outputs
    type: File
    outputSource: sort/out
catheadsort.cwl
cwlVersion: v1.0
class: Workflow
doc: the workflow to analyze hogehoge
inputs:
  - id: input_file
    type: File
  - id: nlines
    type: int
steps:
  - id: cat
    run: cat.cwl
    in:
      - id: input_file
        source: input_file
    out: [out]
  - id: head
    run: head.cwl
    in:
      - id: source
        source: cat/out
      - id: nlines
        source: nlines
    out: [out]
  - id: sort
    run: sort.cwl
    in:
      - id: input_file
        source: head/out    
    out: [out]
outputs:
  - id: outputs
    type: File
    outputSource: sort/out

validateする

$ cwltool --validate catheadsort.cwl
INFO /usr/local/bin/cwltool 1.0.20190906054215
INFO Resolved 'catheadsort.cwl' to 'file:///workspace/catheadsort.cwl'
catheadsort.cwl is valid CWL.

--help で、入力パラメータを確認する

$ cwltool  catheadsort.cwl --help
INFO /usr/local/bin/cwltool 1.0.20190906054215
INFO Resolved 'catheadsort.cwl' to 'file:///workspace/catheadsort.cwl'
usage: catheadsort.cwl [-h] --input_file INPUT_FILE --nlines NLINES
                       [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE
  --nlines NLINES

実行する

  • 入力ファイルは、 inputs.txt
  • 行数は、 2
$ cwltool  catheadsort.cwl --input_file inputs.txt --nlines 2
INFO /usr/local/bin/cwltool 1.0.20190906054215
INFO Resolved 'catheadsort.cwl' to 'file:///workspace/catheadsort.cwl'
INFO [workflow ] start
INFO [workflow ] starting step cat
INFO [step cat] start
INFO [job cat] /tmp/llvqcm4w$ cat \
    /tmp/tmp987f2ums/stg9598e8a9-3dee-4d5e-b31a-0bc458a9a05c/inputs.txt > /tmp/llvqcm4w/83d47473ce9ef8676b4836073cf60719b30d36e4
INFO [job cat] completed success
INFO [step cat] completed success
INFO [workflow ] starting step head
INFO [step head] start
INFO [job head] /tmp/x12dzh44$ head \
    -n2 \
    /tmp/tmpw498v0jn/stg030ab5b0-935a-4864-99b0-25067bd7d302/83d47473ce9ef8676b4836073cf60719b30d36e4 > /tmp/x12dzh44/48e7ecb9aff5982db2c82fd66f2a7b8a5d679b5c
INFO [job head] completed success
INFO [step head] completed success
INFO [workflow ] starting step sort
INFO [step sort] start
INFO [job sort] /tmp/yky_c3oe$ sort \
    -nr \
    /tmp/tmpx70gna2z/stg922f4549-ab39-42ca-9e83-0e7d931c11e5/48e7ecb9aff5982db2c82fd66f2a7b8a5d679b5c > /tmp/yky_c3oe/output.txt
INFO [job sort] completed success
INFO [step sort] completed success
INFO [workflow ] completed success
{
    "outputs": {
        "location": "file:///workspace/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$2112fd70c7e641e08ee46765565254a158dbb43d",
        "size": 17,
        "path": "/workspace/output.txt"
    }
}
INFO Final process status is success

実行結果を確認する

$ cat output.txt
4 sort
1st line.
Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away