0
1

More than 3 years have passed since last update.

初心者)備忘録)スクレイピングとCSVへの出力

Last updated at Posted at 2019-12-01

準備

こちらの記事を参考にスレイピング用のコードを完成させる

ソースコード

import urllib.request
import urllib.error
from bs4 import BeautifulSoup
import csv
import numpy

l_cap_name = []

url = "https://scrapethissite.com/pages/simple/"
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
           }
request = urllib.request.Request(url=url, headers=headers)

try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
    print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
    print('URLError: {}'.format(e.reason))
else:
    for term in BeautifulSoup(response, 'lxml').find_all('span', class_='country-capital'):
        l_cap_name.append([term.string])

print(l_cap_name)


with open("l_cap_name.csv", "w", encoding="utf-8") as f: 
# encoding="Shift-jis"だと'Brajilia'の部分でunicodeencodeerrorが出ました

    writer = csv.writer(f, lineterminator="\n")  # writerオブジェクトの作成、改行記号で行を区切る

    writer.writerows(l_cap_name)

作成したCSVファイル

Andorra la Vella
Abu Dhabi
Kabul
St. John's
The Valley
Tirana
Yerevan
Luanda
None
Buenos Aires
Pago Pago
Vienna

---中略---

Mata-Utu
Apia
Pristina
Sanaa
Mamoudzou
Pretoria
Lusaka
Harare

ちなみに、作成したCSVファイルを読み込むときは

with open('l_cap_name.csv', "r", encoding="utf-8") as r:
    readers = csv.reader(r)
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1