SRA / ENA / GEO をコマンドラインから検索できるツール ffq

Last updated at 2021-06-01Posted at 2021-06-01

はじめに

そういうツールがあると便利だなと思っていたが、自分で作るのは面倒くさいしスキルもちょっと足りないかなと思っていた。そしたら素晴らしいツールが出てきた。

とても簡単なツールなので説明の必要もないけど少しだけ。

インストール

Pythonのツールなのでpipでインストールする。

pip intall ffq

ヘルプ

ffq -h

とくに難しいオプションはない。

ffq 0.0.2: Fetch run information from the European Nucleotide Archive (ENA).

positional arguments:
  IDs         Can be a SRA / ENA Run Accessions or Study Accessions, GEO Study
              Accessions, DOIs or paper titles.

optional arguments:
  -h, --help  Show this help message and exit
  -o OUT      Path to JSON file to write run information. If `--split` is
              used, path to directory in which to place JSON files. (default:
              standard out)
  -t TYPE     The type of term used to query data. Can be one of SRR, ERR,
              DRR, SRP, ERP, DRP, GSE, DOI (default: SRR)
  --split     Split runs into their own files.
  --verbose   Print debugging information

SRR を検索してみる

ffq SRR1000000

出力はJSON形式である。

{
    "SRR1000000": {
        "accession": "SRR1000000",
        "experiment": {
            "accession": "SRX357886",
            "title": "Illumina HiSeq 2000 paired end sequencing",
            "platform": "ILLUMINA",
            "instrument": "Illumina HiSeq 2000"
        },
        "study": {
            "accession": "SRP056282",
            "title": "Allelic Spectrum in Common Disease:Sequence from participants in the FUSION study",
            "abstract": "This study is part of a re-sequencing project to identify variants associated with metabolic syndrome traits in a Finnish cohort. Metabolic syndrome (MetS) increases the risk of cardiovascular disease and diabetes, and prevalence is estimated to be as high as 25% in the United States. MetS is characterized via measure of triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), systolic blood pressure (SBP), diastolic blood pressure (DBP), fasting plasma glucose (FG), body mass index (BMI) and waist-to-hip ratio (WHR). Closely related traits include low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC), fasting plasma insulin (FI) and height (HT).  Seventeen loci associated with TG, HDL-C, LDL-C, TC, FG, and FI (Kathiresan et al. 2008, Willer et al. 2008, Sabatti et al. 2009, Dupuis et al. 2010, Teslovich et al. 2010) were prioritized for sequencing. At each locus, protein-coding regions and 5'' and 3'' untranslated regions of genes... (for more see dbGaP study page.)"
        },
        "sample": {
            "accession": "SRS485766",
            "title": "DNA sample from a human female participant in the dbGaP study \"Sequence Data From Participants in the FUSION Study\"",
            "organism": "Homo sapiens",
            "attributes": {
                "gap_accession": "phs000702",
                "gap_parent_phs": "phs000867",
                "submitter handle": "NIDDK",
                "biospecimen repository": "NIDDK",
                "study name": "Sequence Data From Participants in the FUSION Study",
                "study design": "Case-Control",
                "biospecimen repository sample id": "FU04357",
                "submitted sample id": "FU04357",
                "submitted subject id": "656230",
                "gap_sample_id": "894937",
                "gap_subject_id": "237778",
                "sex": "female",
                "analyte type": "DNA",
                "gap_consent_code": "1",
                "gap_consent_short_name": "GRU-IRB",
                "ENA-FIRST-PUBLIC": "2013-10-01",
                "ENA-LAST-UPDATE": "2018-04-12"
            }
        },
        "title": "Illumina HiSeq 2000 paired end sequencing",
        "files": []
    }
}

生JSONファイルだと、情報量が多くてわかりにくいと思われる場合は、jqコマンドを用いて情報を整理すると良いと思う。

ffq SRR1000000 | jq '.SRR1000000.study.title'
# "Allelic Spectrum in Common Disease:Sequence from participants in the FUSION study"

この記事は以上です。

よかったと思った人はぜひあなたのお気に入りのツールをQiita記事で紹介してください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up