More than 5 years have passed since last update.

Python3とBeautifulSoupとrequests使って図書館で借りた本をブクログの本棚に登録

Last updated at 2018-08-28Posted at 2018-08-25

はじめに

図書館で借りた本を、ブクログに登録する。
下記投稿で書いた図書館のシステムいじる方と、ブクログいじる方を結合した。
ついでに、Class化してすっきりさせた。

Python3とBeautifulSoupとrequests使ってブクログの本棚に本を登録 - Qiita
Python3とBeautifulSoup使って図書館で借りた本をスクレイピング - Qiita

追記（2018/08/29）

パスワードをpassword.pyに記述してimportするようにしました
Classとか関数の構成を見直しました
ブクログのサイトが更新されていたので、classタグを変更しました
すでに登録されているものは予めaddしないようにしました
本の情報を辞書に入れました
for文を一部すっきりさせました
BeautifulSoupで使うparserを明示的に指定しました

コード

こんな感じ。

#soup -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
from password import *

class Library:
    '''library'''
    payload = {
        'utf8': '✓',
        'user[login]': lib_username,
        'user[passwd]': lib_password,
        'act_login': 'ログイン',
        'nextAction': 'mypage_display',
        'prevAction': '',
        'prevParams': '',
        'kobeid': '',
        'pvolid': '',
        'type': '',
        'shozo_kyk': ''
    }
    
    def __init__(self):
        self.s = requests.Session()
        self.login()

    def login(self):
        # login
        r = self.s.get('https://www.lib.city.kobe.jp/opac/opacs/mypage_display')
        soup = BeautifulSoup(r.text, 'html5lib')
        auth_token = soup.find(attrs={'name': 'authenticity_token'}).get('value')
        # print(auth_token)
        self.payload['authenticity_token'] = auth_token
        res = self.s.post('https://www.lib.city.kobe.jp/opac/opacs/login', data=self.payload)
        res.raise_for_status()
        # print(res.text)

    def get_my_borrowing_list(self):
        '''return book list I borrowed and have not returned
        return dictionary list
        '''
        r = self.s.get('https://www.lib.city.kobe.jp/opac/opacs/lending_display')
        soup = BeautifulSoup(r.text, 'html5lib')
        book_rows = soup.find('div', class_= 'table_wrapper lending').find_all('tr')
        book_rows.pop(0) # delete 1st item
        print(len(book_rows))

        book_keys = 'title/1/2/author/publisher'
        book_infos = [book_row.find_all('td')[2].get_text() for book_row in book_rows]
        print(book_infos)
        book_list = [dict(zip(book_keys.split('/'), book_info.split('/'))) for book_info in book_infos]

        print(book_list)
        return book_list

class Booklog:
    '''Booklog'''
    payload = {
        'service':'booklog',
        'ref':'',
        'account':bl_username,
        'password':bl_password
    }

    def __init__(self):
        self.s = requests.Session()
        self.login()

    def login(self):
        res = self.s.post('https://booklog.jp/login/login', data=self.payload)
        res.raise_for_status()
        # print(res.text)

    def search_book_ids(self, keyword):
        '''return the 1st book id from found books'''
        payload = {
            'keyword': keyword,
            'service_id':'1',
            'index':'Books',
        }

        res = self.s.get('https://booklog.jp/search', params=payload)
        res.raise_for_status()
        # print(res.text)
        soup = BeautifulSoup(res.text, 'html5lib')
        registered_book = soup.find_all('a',class_='item-registered-btn')
        print(registered_book)
        print('Registerd books: ' + str(len(registered_book)))

        if registered_book != []:
            return 'exist'
        else:
            book_id_atags = soup.find_all('a',class_='add-item-btn')
            print('Number of the found books: ' + str(len(book_id_atags)))

            book_ids = [atag.get('href') for atag in book_id_atags]

            print(book_ids)
            return book_ids
        
    def add_book_by_id(self, book_id):
        payload = {
            '_method': 'add'
        }
        res = self.s.post('https://booklog.jp' + book_id, data=payload)
        res.raise_for_status()
        # print(res.text)

    def add_book_by_keyword(self, keyword):
        # add 1st result
        book_ids = self.search_book_ids(keyword)
        if book_ids is None:
            print('Cannot find ' + book_info['title'])
        elif book_ids == 'exist':
            print(book_info['title'] + ' already exists')
        else:
            self.add_book_by_id(book_ids[0])

if __name__ == '__main__':
    lib = Library()
    book_list = lib.get_my_borrowing_list()

    bl = Booklog()
    for book_info in book_list:
        book_keyword = book_info['title']+' '+book_info['publisher']
        print(book_keyword)
        bl.add_book_by_keyword(book_keyword)

BeautifulSoupのWarningについて

もともと

soup = BeautifulSoup(r.text)

って書いてたところで、

lib2bkl.py:92: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 92 of the file lib2bkl.py. To get rid of this warning, pass the additional argument 'features="html5lib"' to the BeautifulSoup constructor.

というワーニングが出た。

soup = BeautifulSoup(r.text, 'html5lib')

に変更すると、消えた。

このサイトを参考にした。
Beautiful Soup 4.x では parser を明示指定しよう - AWS / PHP / Python ちょいメモ

おわりに

やりたいことできるようになって満足。
Classにまとめる練習にもなった気がする。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up