LoginSignup
1
1

More than 5 years have passed since last update.

Scrapy でWebページのリンクを抽出する

Last updated at Posted at 2018-07-11

Beautifulsoup で行っているのと同じことを、Scrapy でやってみました。
Beautifulsoup でWebページのリンクを抽出する

プログラム

scrapy01.py
# -*- coding: utf-8 -*-
#
#   scrapy01.py
#
#
#                   Jul/11/2018
#
import scrapy

class FirstScrapySpider(scrapy.Spider):
    name = 'scrapy01'
    allowed_domains = ['ekzemplaro.org']
    start_urls = ['https://ekzemplaro.org']

    def parse(self, response):
        for unit in response.css('a::attr(href)').extract():
            print(unit)
#

実行結果

$ scrapy runspider --loglevel=WARN scrapy01.py
en/
ekzemplaro/
audio_books/
librivox/
./audio/
http://www.hi-ho.ne.jp/linux
./raspberry/
./storytelling/
./crowdsourcing/
https://twitter.com/ekzemplaro
https://github.com/ekzemplaro/
qiita/
./test_dir/

Arch Linux での Scrapy のインストール方法

sudo pacman -S scrapy
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1