2
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

PHPでWebスクレイピングを非同期で実行する

Posted at

必要パッケージをインストール

composer require guzzlehttp/guzzle symfony/dom-crawler

ソース

run.php

<?php
require_once('vendor/autoload.php');

use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
use Symfony\Component\DomCrawler\Crawler;

// URL一覧を用意
$urls = [
	'http://localhost/01.html',
	'http://localhost/02.html',
	'http://localhost/03.html',
	'http://localhost/04.html',
	'http://localhost/05.html',
];

// クライアント
$client = new Client([
	'headers' => [
		'User-Agent' => 'PHP Crawler',
	],
]);

// リクエストの作成
$requests = function() use($urls) {
	foreach ($urls as $url) {
		yield new Request('GET', $url);
	}
};

$pool = new Pool($client, $requests(), [
	// 同時に実行できる数
	'concurrency' => 3,

	// 各処理が成功したときの処理
	'fulfilled' => function($response, $index) {
		// HTMLを取得
		$body = $response->getBody()->getContents();
		// HTMLを解析
		$crawler = new Crawler($body);
		$crawler->filter('a')->each(function($element) {
			echo sprintf("%s\n", $element->text());
		});
	},

	// 各処理が失敗したときの処理
	'rejected' => function($reason, $index) {
		// $reason : GuzzleHttp\Exception\ConnectException
		// $index : integer
		echo sprintf("Error: %s\n", $reason->getMessage());
	},
]);

// 実行開始
$promise = $pool->promise();
$promise->wait();

// 完了
echo "Completed!\n";
2
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?