15
12

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

国会のホームページから議員の名前を抽出する

Last updated at Posted at 2017-10-08

国会議員の名前リストが欲しくなったので、
参議院のホームページの議員一覧のページから議員の名前を抽出するスクリプトを作成しました。

scrape_councillors_name.py

from lxml import html
import requests

page = requests.get('http://www.sangiin.go.jp/japanese/joho1/kousei/giin/194/giin.htm')

tree = html.fromstring(page.content)
rep_names = tree.xpath('//a[contains(@href, "profile")]/text()')

for name in rep_names:
    name_without_zenkaku_space =  name.replace(u" ", "")

    if name_without_zenkaku_space[0] is not '[': 
        print (name_without_zenkaku_space)

衆議院議員の場合は、以下。

scrape_rep_name.py
# -*- coding: utf-8 -*-
from lxml import html
import requests

rep_names = []

for i in range(1, 11) :
    page = requests.get('http://www.shugiin.go.jp/internet/itdb_annai.nsf/html/statics/syu/'+str(i) +'giin.htm')
    tree = html.fromstring(page.content)
    # names_districtsには議員名と選挙区が入る
    names_districts = tree.xpath('//tr[@valign="top"]/td[@class="sh1td5"]/tt[@class="sh1tt1"]/text()')

    # names_districtsから議員名だけを取り出してrep_namesに入れる
    while(len(names_districts)!=0):
        rep_names.append(names_districts.pop(0).replace(u" ", "").replace("\n", ""))
        names_districts.pop(0) 
    
print (rep_names)
15
12
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
15
12

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?