Qiita Teams that are logged in
You are not logged in to any team

Log in to Qiita Team
Community
OrganizationAdvent CalendarQiitadon (β)
Service
Qiita JobsQiita ZineQiita Blog
Help us understand the problem. What is going on with this article?

BeautifulSoupを使ってHTMLファイルからある文字列を含むリンクを抽出する

More than 5 years have passed since last update.

・HTMLファイルから"mp4"を含むリンクを抽出。
・BeautifulSoupを使用。

from BeautifulSoup import BeautifulSoup

open_name = raw_input('Open html file: ')
save_name = raw_input('Save file name: ')

f = open(open_name)
html = f.read()
f.close()

f2 = open(save_name, 'w')

soup = BeautifulSoup(html)

for link in soup.findAll("a"):
    if "mp4" in link.get("href"): # "mp4"を含むリンクを抽出
        f2.writelines(link.get('href') + '¥n')

f2.close()

下記リンクのStack Overflow を参考にしました。
他にもいい方法がありましたらご指摘ください。

参考:
python - how can I get href links from html code - Stack Overflow
https://stackoverflow.com/questions/3075550/how-can-i-get-href-links-from-html-code/3075568#3075568

nkkt
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away