実行環境
Microsoft Windows [Version 10.0.19042.1237]
Windows Subsystem for Linux 2
Ubuntu 20.04.3 LTS
perl5 (revision 5 version 26 subversion 2)
例
ls -1v | perl -lne 'if(/_R1/){print "$`"}'
(base) usr@LIN:~/test$ ls -1v
SampleName1_S1_L001_R1_001.fastq.gz
SampleName1_S1_L001_R2_001.fastq.gz
SampleName2_S2_L001_R1_001.fastq.gz
SampleName2_S2_L001_R2_001.fastq.gz
(base) usr@LIN:~/test$ ls -1v | perl -lne 'if(/_R1/){print "$`"}'
SampleName1_S1_L001
SampleName2_S2_L001
解説
perl の特殊変数1
- マッチした行全体:
$_
(省略可) - マッチした文字列:
$&
- マッチした部分より前の文字列:
$`
- マッチした部分より後の文字列:
$'
grep
だと行単位で前後の出力はできるが、行内だと難しい。
今回は 3 を使用。
perl の条件分岐とパターンマッチ
'if(/_R1/){print "$`"}'
「入力行にもし_R1
が含まれていれば、_R1
より前の文字列を出力する」という意味。
その他
-
bash
ls -1
:ディレクトリ一覧を縦に並べる
ls -v
:ディレクトリ一覧を数字の昇順で並べる(100が2より後にくる) -
perl
-n
:一行ずつ読み込んで最終行まで処理する
-l
:入力行から行末の改行を除去して、出力時に改行を付ける
-e
:ワンライナーであることを宣言(オプションの中で最後に付ける)
正規表現を使った実用例
ls -1v | perl -lne 'if(/_.*_.*_R1/){print "$`"}'
(base) usr@LIN:~/test$ ls -1v | perl -lne 'if(/_.*_.*_R1/){print "$`"}'
SampleName1
SampleName2
perlの正規表現2
.
:改行を除く任意の一文字
*
:直前の文字が0個以上
備考
MiSeqの出力ファイル名(フォーマット)(*イルミナHPから引用, 2021/10/09)
FASTQ files are named with the sample name and the sample number, which is a numeric assignment based on the order that the sample is listed in the sample sheet. For example: Data\Intensities\BaseCalls\SampleName_S1_L001_R1_001.fastq.gz
▶ SampleName —The sample name provided in the sample sheet. If a sample name is not provided, the file name includes the sample ID, which is a required field in the sample sheet and must be unique.
▶ S1 —The sample number based on the order that samples are listed in the sample sheet starting with 1. In this example, S1 indicates that this sample is the first sample listed in the sample sheet.
NOTE
>Reads that cannot be assigned to any sample are written to a FASTQ file for sample number 0, and excluded from downstream analysis.
▶ L001 —The lane number.
▶ R1 —The read. In this example, R1 means Read 1. For a paired-end run, there is at least one file with R2 in the file name for Read 2. When generated, index reads are I1 or I2.
▶ 001 —The last segment is always 001.
FASTQ files that do not follow this naming convention cannot be imported into BaseSpace.