More than 5 years have passed since last update.

国会のホームページから議員の名前を抽出する

Python

Last updated at 2017-11-05Posted at 2017-10-08

国会議員の名前リストが欲しくなったので、
参議院のホームページの議員一覧のページから議員の名前を抽出するスクリプトを作成しました。

scrape_councillors_name.py


from lxml import html
import requests

page = requests.get('http://www.sangiin.go.jp/japanese/joho1/kousei/giin/194/giin.htm')

tree = html.fromstring(page.content)
rep_names = tree.xpath('//a[contains(@href, "profile")]/text()')

for name in rep_names:
    name_without_zenkaku_space =  name.replace(u"　", "")

    if name_without_zenkaku_space[0] is not '[': 
        print (name_without_zenkaku_space)

衆議院議員の場合は、以下。

scrape_rep_name.py

# -*- coding: utf-8 -*-
from lxml import html
import requests

rep_names = []

for i in range(1, 11) :
    page = requests.get('http://www.shugiin.go.jp/internet/itdb_annai.nsf/html/statics/syu/'+str(i) +'giin.htm')
    tree = html.fromstring(page.content)
    # names_districtsには議員名と選挙区が入る
    names_districts = tree.xpath('//tr[@valign="top"]/td[@class="sh1td5"]/tt[@class="sh1tt1"]/text()')

    # names_districtsから議員名だけを取り出してrep_namesに入れる
    while(len(names_districts)!=0):
        rep_names.append(names_districts.pop(0).replace(u"　", "").replace("君\n", ""))
        names_districts.pop(0) 
    
print (rep_names)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up