This article is a Private article. Only a writer and users who know the URL can access it.
Please change open range to public in publish setting if you want to share this article with other users.

More than 1 year has passed since last update.

Webスクレイピングでimgタグ内のsrcを抽出する

AdventCalendar2023

Posted at 2023-12-15

この記事はSLP KBIT Advent Calendar 2023の15日の記事です。

はじめに

こんにちは、harryです。今回はスクレイピングを使って、webページにある画像を抽出し、それを表示する。というところまでやりたかったのですが、上手くできなかったので、imgタグ内のsrcを抽出してみました。スプレイピングに関しては、前日の投稿で説明してくれているので、省略します。

src属性とは

src属性は、Webサイトのページ内で外部から読み込む資源の所在を指定する属性のことです。

実行

今回は、その日の月の形の画像を載せているサイトを使って、そのサイトの今日の月の形の画像のsrcを抽出します。
サイトのurl:https://eto-calendar.com/astronomy/moon/?date=today

import requests
from bs4 import BeautifulSoup

url = "https://eto-calendar.com/astronomy/moon/?date=today"

response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")
data = []
for img in soup.select('.moonimage img'):
    data.append(dict(src=img['src']))
print(data)

実行結果

[{'src': 'moonage.php?day=15&month=12&year=2023'}]

さいごに

分からないことばかりで全然上手くいきませんでしたが、色々学ぶことができ勉強になりました。もっと時間をかけてちゃんと勉強しようと思います。

参考文献

https://qumeru.com/magazine/462
https://udemy.benesse.co.jp/development/python-work/web-scraping.html
https://teratail.com/questions/183830

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up