22
21

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

PHPでスクレイピング Goutte メモ

Last updated at Posted at 2016-03-17

#導入
以下URLを参照。
今回はPHP5.3の環境だったのでGoutte1.xのpharをダウンロードして使用。
https://github.com/FriendsOfPHP/Goutte

#準備

require_once __DIR__ . "/goutte-v1.0.7.phar";
use Goutte\Client;

使用例

リクエスト (request)

$client = new Client();
$crawler = client->request('GET', 'http://test.com');

ステータス取得

$client->getResponse()->getStatus();

タグから抽出 (filter)


// 最初(0番目)のpタグのテキストを抽出 (class指定)
$crawler->filter("p.class")->eq(0)->text();

// aタグのhtmlを抽出 (id指定)  
$crawler->filter("a#id")->html();

// aタグのhrefの値を抽出
$crawler->filter("a.class")->attr("href");

// 同様なタグが複数ある場合順番に処理
$crawler->filter("div.class table")->each(function ($node){
    // tdタグが存在した時だけ処理
    if(count($node->filter("td")) !== 0){
        echo $node->text();
    }
});

## 現在のページのURLを取得

$client->getHistory()->current()->getUri();

リンクをクリック

// 「次へ」というテキストのリンクを探しリンク先を取得
$link = $crawler->selectLink("次へ")->link();
$crawler = $client->click($link);

22
21
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
22
21

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?