Python3でWebスクレイピング

ちょっと触ってるとエラーの連続で何もさせてもらえなかったので、解決法などを載せ、自分用の忘備録とすることにしよう。
まず、なぜエラーが出たのか。
これは、SSL証明書の問題であるので、sslをimportする必要がある。
申し訳ないが、その先のWebサイトのURLを忘れてしまった。
import ssl
ssl._create_default_https_context = ssl._create_unverified_contextの部分はその神様からの引用です。

コードの解説
import系は上からhttpリクエストを送るために使用,成形するために使用,sslを書かないとエラーになったので使用

変数urlには任意のurlを指定。
変数htmlに、requestなどを書き、htmlを丸ごと入れる
soupにhtmlを入れる
成形作業をする

って感じ。


import urllib.request
from bs4 import BeautifulSoup

import ssl

ssl._create_default_https_context = ssl._create_unverified_context

url = input()

html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, 'html.parser')

tables = soup.findAll('#', class_='multicol')


print(soup.title)

BeautifulSoup4のコマンドのリンク先

BeautifulSoup4のコマンドで重宝させて頂きました。感謝感謝です。

追記

SSLとか使わずにこんな感じでもできたから追記しておく。


# ↓のようにモジュールを読み込む

from urllib.request import urlopen
from bs4 import BeautifulSoup

# html = urlopen("") ←のようにして内容を開く.参考例↓
html = urlopen("Http://www.pythonscraping.com/pages/page1.html")

# soupに内容を取得
# soup = BeautifulSoup(#↑で指定した変数名(htmlなど),"lxml")
soup = BeautifulSoup(html,"lxml")

# 表示
 print(soup)

 # タグの取得("a")の場合
 soup.find_all("a")

 # タグの取得(先頭の一つだけ欲しい場合)
 soup.find("a")

Python3 Webスクレイピング 自分用忘備録

Python3でWebスクレイピング

BeautifulSoup4のコマンドのリンク先

追記

Python3 Webスクレイピング自分用忘備録