12
14

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

PythonでWebページのリンクを抽出するスクリプトを書いた

Last updated at Posted at 2015-12-13

BeautifulSoup4、素晴らしい。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# python 2.x
#
# usage: python this_script.py "http://qiita.com/"
#
# 事前に beautifulsoup4 をインストール
# pip install beautifulsoup4

import codecs
import urllib2
import sys
from bs4 import BeautifulSoup

url = sys.argv[1]
html = urllib2.urlopen(url).read().decode('utf-8', 'ignore')
soup = BeautifulSoup(html, "html.parser")
links = [a.get("href") for a in soup.find_all("a")]

for l in links: print l
12
14
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
12
14

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?