LoginSignup
12
14

More than 5 years have passed since last update.

PythonでWebページのリンクを抽出するスクリプトを書いた

Last updated at Posted at 2015-12-13

BeautifulSoup4、素晴らしい。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# python 2.x
#
# usage: python this_script.py "http://qiita.com/"
#
# 事前に beautifulsoup4 をインストール
# pip install beautifulsoup4

import codecs
import urllib2
import sys
from bs4 import BeautifulSoup

url = sys.argv[1]
html = urllib2.urlopen(url).read().decode('utf-8', 'ignore')
soup = BeautifulSoup(html, "html.parser")
links = [a.get("href") for a in soup.find_all("a")]

for l in links: print l
12
14
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
12
14