LoginSignup
1
2

More than 5 years have passed since last update.

0.タブ区切りのファイルをtabulator で表示する

Last updated at Posted at 2017-12-19

Javascriptで表を作成するtabulatorというツールをお試し利用する

ブラウザー上でグリッド表示をするのにtabulatorが良さそうなので利用してみる

json形式のデータを読み取って表示する事としたい
表にする元データとしてRefSeqのLRG_RefSeqGeneを利用

タブ区切りデータからjsonに変換

tabulatorで読み込むにはjson形式のデータが必要なので、LRG_RefSeqGeneをjson に変換しなくては

適当なものが見当たらなかったので、pythonで処理する

  • convert_csv_to_json(filename=読み取り対象のファイル名, delimiter=区切り文字, maximum_lines=変換する行数, out_file=出力先のファイル名)
    • csvモジュールのDictReaderを利用してタブ区切りのデータを読み込み
      • 行頭のコラムの"#"がコラム名に入ってしまうので、DictReader.fieldnames[0]を直接編集
    • 変換する行数を指定可能にする: maximum_lines
    • jsonのテキストにユニークなidを追加
    • 区切り文字をタブ以外に指定できるようにする。
create_json.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import json
import csv
__author__ = 'percipere'
__date__ = '2017/12/13'
__date_of_last_modification__ = ''
# Read data file and create JSON object
def convert_csv_to_json(filename='./LRG_RefSeqGene', delimiter='\t', maximum_lines=100, out_file="./output.json"):
    """
    Read csv file and convert it to json file.
    The first line of the csv should be delimiter separated column names.
    :param filename: file name of the CSV
    :param delimiter:
    :param maximum_lines: 0 means all lines
    """
    data_list = []
    with open(filename, 'r') as file_handle:
        reader = csv.DictReader(file_handle, delimiter=delimiter)
        reader.fieldnames[0] = reader.fieldnames[0][1:] if reader.fieldnames[0].startswith('#') else reader.fieldnames[
            0]
        for row_id, row in enumerate(reader, start=1):
            row.update({"id": row_id})
            data_list.append(row)
            if maximum_lines and row_id >= maximum_lines:
                break
    with open(out_file, 'w') as write_handle:
        json.dump(data_list, write_handle)

if __name__ == "__main__":
    convert_csv_to_json()

data_listにidを付記したdictionaryを追加して、最後にjson.dumpで書き出し

$ python3 ./create_json.py 

入力ファイルの先頭4行

LRG_RefSeqGene
#tax_id GeneID  Symbol  RSG LRG RNA t   Protein p   Category
9606    29974   A1CF    NG_029916.1     NM_001198819.1      NP_001185748.1      reference standard
9606    29974   A1CF    NG_029916.1     NM_014576.3     NP_055391.2     aligned: Selected
9606    29974   A1CF    NG_029916.1     NM_138932.2     NP_620310.1     aligned: Selected
9606    29974   A1CF    NG_029916.1     NM_138933.2     NP_620311.1     aligned: Selected

出力ファイルの先頭4項目 念のため行末に]を付記

output.json
[{"tax_id": "9606", "GeneID": "29974", "Symbol": "A1CF", "RSG": "NG_029916.1", "LRG": "", "RNA": "NM_001198819.1", "t": "", "Protein": "NP_001185748.1", "p": "", "Category": "reference standard", "id": 1}, {"tax_id": "9606", "GeneID": "29974", "Symbol": "A1CF", "RSG": "NG_029916.1", "LRG": "", "RNA": "NM_014576.3", "t": "", "Protein": "NP_055391.2", "p": "", "Category": "aligned: Selected", "id": 2}, {"tax_id": "9606", "GeneID": "29974", "Symbol": "A1CF", "RSG": "NG_029916.1", "LRG": "", "RNA": "NM_138932.2", "t": "", "Protein": "NP_620310.1", "p": "", "Category": "aligned: Selected", "id": 3}, {"tax_id": "9606", "GeneID": "29974", "Symbol": "A1CF", "RSG": "NG_029916.1", "LRG": "", "RNA": "NM_138933.2", "t": "", "Protein": "NP_620311.1", "p": "", "Category": "aligned: Selected", "id": 4},]

これで元データが作成できた

今回はここまで:smiley:

1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2