More than 5 years have passed since last update.

Seleniumを使ってCourseraの教材をまとめてダウンロードしてみた

Posted at 2018-10-19

概要

（この記事は既にCourseraを受講されている方へ向けたものです。）
Courseraはとても便利な学習サービスで、講座で使用される動画やスライドも一応ダウンロードできるのですが、それぞれ各動画の画面のリンクからしか出来ず手作業で全てダウンロードするのが面倒でした。そこで、今回はSeleniumを用いて作業を自動化しました。

環境について

Windows 10 Pro (64bit)
Chrome 69.0.3497.100
ChromeDriver 2.42.591088
Python 3.7
selenium 3.14.1

環境構築については他の方がよくまとめられているので割愛します。
・Selenium、Chromedriverの環境構築について
Python + Selenium で Chrome の自動操作を一通り @memakura https://qiita.com/memakura/items/20a02161fa7e18d8a693

・Pythonの環境構築について
Anaconda で Python 環境をインストールする
@t2y
https://qiita.com/t2y/items/2a3eb58103e85d8064b6

内容

ログインページよりログインします。（facebook経由でログインをしている方は、一旦設定画面から認証を解除して、パスワードを設定してください。）
講座の最初の動画へアクセスします。
mp4ファイルとpdfファイルを探しあれば、ダウンロードします。なければ次に進みます。
全てダウンロードが終わりホーム画面へ戻ると終了します。

設定

後述のコード内の下記箇所にご自身の設定を代入してください。

#chromeの実行パスを代入してください。
options.binary_location = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"

#chromedriverのパスを代入してください。
driver = webdriver.Chrome(chrome_options=options, executable_path="C:\\Users\\Daichi\\chromedriver.exe")
#emailとpasswordを代入してください。
email = ""
password = ""
#受講したい講座の最初の授業のページのURLを代入してください。
first_page_url = ""

##コード

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

import urllib.request

import os
import time

def login(login_url, user_password, user_email):
    #トップ画面を開く。
    driver.get(login_url)
    email = driver.find_element_by_id("emailInput_5-input")
    password = driver.find_element_by_id("passwordInput_6-input")

    email.send_keys(user_email)
    password.send_keys(user_password)

    driver.find_element_by_xpath('//*[@id="authentication-box-content"]/div/div[2]/div/div[1]/form/div[1]/button/span').click()

def download_all(driver):
    try:
        #授業のタイトルを取得
        title = driver.find_element_by_class_name('headline-2-text').text
        try:
            #ビデオをダウンロード
            video_title = "mp4/" + (str(title) + ".mp4").replace(" ", "_").replace("/", "_")
            video = driver.find_element_by_xpath('//*[@id="rendered-content"]/div/div/div[1]/div/div[2]/div/div[2]/div/div/div/div/div[2]/div[2]/ul/span/li[1]/a')
            video_url = video.get_property("href")
            urllib.request.urlretrieve(video_url, video_title)
            print(str(title) + "のビデオをダウンロード")
        except:
            print(str(title) + "のビデオはありませんでした。")
        
        try:
            #スライドをダウンロード
            slide_title = "pdf/" + (str(title) + ".pdf").replace(" ", "_").replace("/", "_")
            slide = driver.find_element_by_xpath('//*[@id="rendered-content"]/div/div/div[1]/div/div[2]/div/div[2]/div/div/div/div/div[2]/div[2]/ul/li/a')
            slide_url = slide.get_property("href")
            urllib.request.urlretrieve(slide_url, slide_title)
            print(str(title) "のスライドをダウンロード")
        except:
            print(str(title) + "のスライドはありませんでした。")
    except:
        print("動画の回ではありません。")

def go_next_page(driver):
    try:
        driver.find_element_by_xpath('//*[@id="rendered-content"]/div/div/div[1]/div/div[2]/nav/div/a[2]').click()
    except:
        try:
            driver.find_element_by_link_text('次へ').click()
        except:
            try:
                driver.find_element_by_xpath('//*[text()=\"続ける\"]').click() 
            except:
                print("問題が発生しました。")
    finally:
        time.sleep(3)

if __name__ == "__main__":
    options = Options()
    #chromeの実行パスを代入してください。
    options.binary_location = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"
    options.add_experimental_option("prefs", {
    "directory_upgrade": True,
    "safebrowsing.enabled": True,
    'profile.default_content_setting_values.automatic_downloads': 2
    })
    #options.add_argument("--headless")
    #chromedriverのパスを代入してください。
    driver = webdriver.Chrome(chrome_options=options, executable_path="C:\\Users\\Daichi\\chromedriver.exe")
    driver.set_window_size(1920, 1080)

    #emailとpasswordを代入してください。
    email = ""
    password = ""
    
    os.makedirs("mp4", exist_ok=True)
    os.makedirs("pdf", exist_ok=True)
    
    login_url = "https://www.coursera.org/?authMode=login"
    login(login_url, password , email)

    #受講したい講座の最初の授業のページのURLを代入してください。
    first_page_url = ""
    driver.get(first_page_url)

    while True:
        time.sleep(3)
        download_all(driver)
        time.sleep(3)

        try:
            check = driver.find_element_by_link_text("ホーム")
            if check:
                print("終了しました。")
                break
        except:
            pass
        go_next_page(driver)

注意

Courseraのサイトの仕様が（2018/10/19現在から）変わったら動かなくなります。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up