Pythonのスクレイピング

Last updated at 2019-08-23Posted at 2019-08-21

Pythonを使用したスクレイピング

Pythonのスクレイピングのメモです。

動作環境

Mac OS Mojave
Python 3.6.5

HTML要素にclass、idセレクタが付いていない場合

name属性が付いていると仮定しますが、以下のようなタグだった場合

<input type="text" name="hoge" value="fuga" autocomplete="off">

上記を取得するには以下のようなPythonコードであれば取得可能
HTTPS対応とかは別途必要。

# アクセス先のURL
url = 'http://hoge.com'
# レスポンス取得
res = requests.get(url)
# BeautifulSoupで形成
soup = BeautifulSoup(res.text, "html.parser")
# inputタグのname属性"hoge"のvalueを取得する
name_hoge_val = soup.find('input', {'name':'hoge'})['value']

Pythonスクレイピング【SEO要素を取得する】

ちょっとエラー処理とかやっていないので、取得できなかった場合とかエラーになりますがとりあえず。。。後々修正する（予定）

inputでターミナルからURLを受け付けます。
ターミナルからURLを入力したら、入力したURLでスクレイピングを仕掛けます。

スクレイピング先のサイトのSEO要素（Tilte,　Description, Keywords）を取得します。

import requests;
from bs4 import BeautifulSoup;

# ターミナルからURLを受け付ける
input_url = input("検索したいワードを入力: ")
res = requests.get(input_url)
soup = BeautifulSoup(res.text, 'html.parser')

def get_seo_content(input_url, soup):
    # headタグの中身を取得
    head_info = soup.find('head')

    # titleの取得
    title = head_info.find('title').getText()

    # Descriptionの取得
    meta_description = head_info.find('meta', {'name' : 'description'})
    description = meta_description['content']

    # Keywordsの取得
    meta_keywords = head_info.find('meta', {'name' : 'keywords'})
    keywords = meta_keywords['content']

    '''
        結果の出力
    '''
    print(
        '''
        ###############################
        Title: {0}
        description {1}
        Keywords {2}
        ###############################
        '''
        .format(title, description, keywords)
    )


get_seo_content(input_url, soup)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up