More than 5 years have passed since last update.

PythonでWebページのリンクを抽出するスクリプトを書いた

Python

Last updated at 2015-12-13Posted at 2015-12-13

BeautifulSoup4、素晴らしい。

# !/usr/bin/env python
# -*- coding: utf-8 -*-
# python 2.x
#
# usage: python this_script.py "http://qiita.com/"
#
# 事前に beautifulsoup4 をインストール
# pip install beautifulsoup4

import codecs
import urllib2
import sys
from bs4 import BeautifulSoup

url = sys.argv[1]
html = urllib2.urlopen(url).read().decode('utf-8', 'ignore')
soup = BeautifulSoup(html, "html.parser")
links = [a.get("href") for a in soup.find_all("a")]

for l in links: print l

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up