More than 3 years have passed since last update.

初心者)備忘録）スクレイピングとCSVへの出力

Last updated at 2019-12-02Posted at 2019-12-01

準備

こちらの記事を参考にスレイピング用のコードを完成させる

ソースコード

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import csv
import numpy

l_cap_name = []

url = "https://scrapethissite.com/pages/simple/"
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
           }
request = urllib.request.Request(url=url, headers=headers)

try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
    print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
    print('URLError: {}'.format(e.reason))
else:
    for term in BeautifulSoup(response, 'lxml').find_all('span', class_='country-capital'):
        l_cap_name.append([term.string])

print(l_cap_name)


with open("l_cap_name.csv", "w", encoding="utf-8") as f: 
# encoding="Shift-jis"だと'Brajilia'の部分でunicodeencodeerrorが出ました

    writer = csv.writer(f, lineterminator="\n")  # writerオブジェクトの作成、改行記号で行を区切る

    writer.writerows(l_cap_name)

作成したCSVファイル

Andorra la Vella
Abu Dhabi
Kabul
St. John's
The Valley
Tirana
Yerevan
Luanda
None
Buenos Aires
Pago Pago
Vienna

---中略---

Mata-Utu
Apia
Pristina
Sanaa
Mamoudzou
Pretoria
Lusaka
Harare

ちなみに、作成したCSVファイルを読み込むときは

with open('l_cap_name.csv', "r", encoding="utf-8") as r:
    readers = csv.reader(r)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up