Help us understand the problem. What is going on with this article?

ヒト変異のHGVS表記の変換(vep api)

変異のHGVS表記の相互変換

これまでに、ヒトゲノムの変異のHGVSの変換を行うために、変異のHGVS表記の相互変換(Ensembl variant_recoder)変異のHGVS表記の変換(Python hgvs)等を試してみた

しかし、お手軽さの点ではvep apiを利用するのにかなわないようだ

vep restful api のvariant recoderを利用する

変異のHGVS表記の相互変換(Ensembl variant_recoder)のオプション指定と統一するため、下記のようなコードをpythonで作成

Pythonのrequests モジュールを予めインストールしておいてください

variant_recoder_rest.py
import argparse
import json
import requests

server_grch37 = "https://grch37.rest.ensembl.org"
server_grch38 = "https://rest.ensembl.org"
variant_recoder_human = "/variant_recoder/human"


def variant_recoder(input_data, grch37, pretty):
    server = server_grch38 if not grch37 else server_grch37
    api_url = server + variant_recoder_human + '/' + input_data
    result = requests.get(api_url, headers={"Content-Type": "application/json"})
    if result.status_code != requests.codes.ok:
        header = result.headers
        ratelimit_remaining = int(header['X-RateLimit-Remaining'])
        if ratelimit_remaining == 0:
            print("You are not allowed to submit any request.")
            raise ValueError(
                "You've reached the rate limit of api. Please wait {} seconds.".format(header['X-RateLimit-Reset']))
        raise ValueError(result.text)

    if pretty:
        print(json.dumps(result.json(), indent=4))
    else:
        print(json.dumps(result.json()))


def parse_args():
    parser = argparse.ArgumentParser(description='vep recoder with ensembl vep API.')
    parser.add_argument('--input_data', nargs=1, required=True,
                        help='hgvs string to convert')
    parser.add_argument('--grch37', action='store_true',
                        help='Set genome version to GRCh37. (Default version is GRCh38)')
    parser.add_argument('--pretty', action='store_true', help="Prettify JSON result")
    return parser.parse_args()


if __name__ == "__main__":
    args = parse_args()
    input_data = args.input_data[0]
    grch37 = args.grch37
    pretty = args.pretty
    variant_recoder(input_data=input_data, grch37=grch37, pretty=pretty)

Bug Fix(2019/12/18)

前回同様指定可能なオプションは下記の通り

オプション 説明 コメント
--input_data 変換元のhgvsで表記された文字列 ダブルクオーテーション又はシングルクオーテーションで区切る
--grch37 ヒトゲノムのバージョン指定 無指定であればGRCh38(hg38)が利用される
--pretty JSON表記を見やすく整形表示する

GRCh37で取得

変異のhgvs表記'NM_000603.4:c.894T>G'から変換結果を取得してみる

$ python variant_recoder_rest.py \
    --input_data 'NM_000603.4:c.894T>G' \
    --grch37 \
    --pretty

結果は下記の通り

[
    {
        "hgvsp": [
            "ENSP00000297494.3:p.Asp298Glu",
            "ENSP00000417143.1:p.Asp92Glu",
            "ENSP00000420551.1:p.Asp298Glu",
            "ENSP00000420215.1:p.Asp298Glu",
            "NP_000594.2:p.Asp298Glu",
            "NP_001153581.1:p.Asp298Glu",
            "NP_001153582.1:p.Asp298Glu",
            "NP_001153583.1:p.Asp298Glu"
        ],
        "id": [
            "rs1799983",
            "CM981388"
        ],
        "hgvsg": [
            "NC_000007.13:g.150696111T>G"
        ],
        "hgvsc": [
            "ENST00000297494.3:c.894T>G",
            "ENST00000461406.1:c.276T>G",
            "ENST00000467517.1:c.894T>G",
            "ENST00000484524.1:c.894T>G",
            "NM_000603.4:c.894T>G",
            "NM_001160109.1:c.894T>G",
            "NM_001160110.1:c.894T>G",
            "NM_001160111.1:c.894T>G"
        ],
        "input": "NM_000603.4:c.894T>G"
    }
]

GRCh38で取得

--grch37 オプションを省略した場合にはGRCh38上での変換結果が出力される

$ python variant_recoder_rest.py \
    --input_data 'NM_000603.4:c.894T>G' \
    --pretty
[
    {
        "hgvsc": [
            "ENST00000297494.7:c.894T>G", 
            "ENST00000461406.5:c.276T>G", 
            "ENST00000467517.1:c.894T>G", 
            "ENST00000484524.5:c.894T>G", 
            "NM_000603.4:c.894T>G", 
            "NM_001160109.1:c.894T>G", 
            "NM_001160110.1:c.894T>G", 
            "NM_001160111.1:c.894T>G"
        ], 
        "input": "NM_000603.4:c.894T>G", 
        "hgvsp": [
            "ENSP00000297494.3:p.Asp298Glu", 
            "ENSP00000417143.1:p.Asp92Glu", 
            "ENSP00000420551.1:p.Asp298Glu", 
            "ENSP00000420215.1:p.Asp298Glu", 
            "cds46555.1:p.Asp298Glu", 
            "cds46558.1:p.Asp298Glu", 
            "cds46557.1:p.Asp298Glu", 
            "cds46556.1:p.Asp298Glu"
        ], 
        "hgvsg": [
            "NC_000007.14:g.150999023T>G"
        ], 
        "id": [
            "rs1799983", 
            "CM981388"
        ]
    }
]

hgvsg の結果をみると 

  • GRCh37
    • "NC_000007.13:g.150696111T>G"
  • GRCh38
    • "NC_000007.14:g.150999023T>G"

となっていて、指定したゲノム上で変換が行われている事が確認できる

単位時間あたりのリクエストの制限

EnsemblのAPIの利用には単位時間あたりの利用制限が設けられている
通常、上限が3600秒間に55000回となっているので、リミットに達する事は無いと思えるけれど、もしリミットに達してしまった場合には何秒間待てば再開できるかという情報を表示してエラーとなる

今回はここまで:smile:

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした