Djangoでスクレイピングをする

Posted at 2024-07-21

はじめに

こんにちは、エンジニアのkeitaMaxです。

前回、Djangoの環境をDockerで構築して、カスタムコマンドと定期実行を作成しました。

今回は、カスタムコマンドを使ってスクレイピングをしたいと思います。

前回の記事

コマンドの作成

scrapingというコマンドを新しく作成しました。

scraping.py

from django.core.management.base import BaseCommand

class Command(BaseCommand):

    def handle(self, *args, **options):
        print("scraping")

カスタムコマンド作成については前の記事をご覧ください。

スクレイピングの作成

ライブラリインストール

BeautifulSoupというライブラリを使用して作成します。

以下のコマンドでインストールします。

pip install requests beautifulsoup4

ついでにrequirements.txtファイルにも記載しておきます。

requirements.txt

Django==5.0

apscheduler
mysqlclient
requests
beautifulsoup4

コマンド修正

　先ほど作成したscraping.pyファイルを以下のように修正します。

scraping.py

from django.core.management.base import BaseCommand
from bs4 import BeautifulSoup
import requests

class Command(BaseCommand):
    def handle(self, *args, **options):
        res = requests.get('https://tankomayan.qboad.com/')
        soup = BeautifulSoup(res.text, 'html.parser')
        title_text = soup.find('title').get_text()
        print(title_text)

https://tankomayan.qboad.com/は自分で運営しているサイトです。

このサイトのtitleをとるような処理になります。

ちなみにこのサイトのtitleは以下の感じです。

たんこまゃん-TOPという文字列が取れたら成功です。

実行

以下のコマンドでカスタムコマンドを実行してみます。

python3 manage.py scraping

結果は以下のようになりました。

root@aa175d699bf9:/code# python3 manage.py scraping
ããã¾ãã-TOP
root@aa175d699bf9:/code#

ããã¾ãã-TOPと日本語部分が文字化けしてしまいました。

文字化けの修正

以下のようにscraping.py１行を追加しました。

scraping.py

from django.core.management.base import BaseCommand
from bs4 import BeautifulSoup
import requests

class Command(BaseCommand):

    def handle(self, *args, **options):
        res = requests.get('https://tankomayan.qboad.com/')
        soup = BeautifulSoup(res.text, 'html.parser')
        soup = BeautifulSoup(res.content.decode("utf-8", "ignore"), "html.parser")　# 追加
        title_text = soup.find('title').get_text()
        print(title_text)

これでコマンドを叩いてみます。

root@aa175d699bf9:/code# python3 manage.py scraping
たんこまゃん-TOP
root@aa175d699bf9:/code#

無事たんこまゃん-TOPを取得することができました。

おわりに

スクレイピングが簡単にできました。

この記事での質問や、間違っている、もっといい方法があるといったご意見などありましたらご指摘していただけると幸いです。

最後まで読んでいただきありがとうございました！

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Djangoでスクレイピングをする

はじめに

コマンドの作成

スクレイピングの作成

ライブラリインストール

コマンド修正

実行

文字化けの修正

おわりに

参考

次の記事