1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

AmazonランキングをBeautifulSoapでスクレイピングしてAmazonアソシエイトタグ付きをTweepyでつぶやく

Last updated at Posted at 2020-11-30

# 目的

ものすごく久々にGoogle App Engine(GAE/P)を触りたくなった。
Twitter投稿も以前は、PHPで書いていたので、Pythonで。
以前は、GAEランチャーアプリがあったが、今は見当たらなく、CUIにてデプロイするようだ。

手順

1.Amazon ランキングページのURLをBeautifulSoapに入れて、スクレイピング。
2.商品情報を取得して、URLにアソシエイトタグを追加
3.Tweepyでつぶやく
4.定期実行で定期的に実行する。

1〜3は、こんな感じ。

 main.py
from flask import Flask
import requests
from bs4 import BeautifulSoup
import logging
import tweepy
import datetime
import time

app = Flask(__name__)

@app.route('/aibooksranking')
def aibooksranking():
    url = "https://www.amazon.co.jp/gp/bestsellers/books/720370/ref=pd_zg_hrsr_books"
    htmltext = requests.get(url).text
    soup = BeautifulSoup(htmltext,'html.parser')

    s = ''
    for el in soup.find_all("li", class_="zg-item-immersion"):  # zg_itemRow
        rank = el.find("span", "zg-badge-text").get_text()
        name = el.find("div", "p13n-sc-truncate").get_text().lstrip()
        item_urls = "https://www.amazon.co.jp" + el.find("a").attrs['href']

        poss = item_urls.find('/dp/')
        pose = item_urls.find('?', poss+4)
        jan = item_urls[poss+4:pose]
        aurl = "http://www.amazon.co.jp/dp/"+jan+"/ref=nosim?tag=xxxxxxx"
        dt_now = datetime.datetime.now(datetime.timezone(datetime.timedelta(hours=9)))
        today = dt_now.strftime('%Y年%m月%d日%H時')
        message = today + "の人工知能本売上"+rank +"位は\n\n" + name + "\n" + aurl
        #tweetを投稿
        consumer_key = ''
        consumer_secret = ''
        access_token = ''
        access_token_secret = ''

        auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
        auth.set_access_token(access_token, access_token_secret)
        api = tweepy.API(auth)
        api.update_status(message)

        time.sleep(60*1)

if __name__ == '__main__':
   app.run(host='127.0.0.1', port=8080, debug=True)
gcloud app deploy

Google App Engineの定期実行は、cron.yamlを編集する。

cron:
- description: tweet_post
  url: /aibooksranking
  schedule: every day 12:00
  timezone: Asia/Tokyo

cron.yamlをデプロイする。

gcloud app deploy cron.yaml

これで毎日、12時にランキングページから得た商品情報をつぶやく。

WEBページの構成が変わると修正しないといけないのがスクレイピングの弱み…、

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?