2
8

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

PHPでOGPを取得

Last updated at Posted at 2018-11-02

OGPに括るとライブラリがあるようですのでそちらを使った方が良いかもしれません。

おてがるにOGPのチェッカーを作る - Qiita

よく使うのでメモ
curl例

$title = '';

$ch = curl_init($url);// urlは対象のページ
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);// exec時に出力させない
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);// リダイレクト許可
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);// 最大リダイレクト数
$html = curl_exec($ch);
$status_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

// ここでhtmlのfalseチェックや空チェックやバイナリチェックやステータスコードチェック
// 空が来るとsimplexml_import_domでInvalid Nodetype to importのエラー

$dom_document = new \DOMDocument();
$from_encoding = mb_detect_encoding($html, ['ASCII', 'ISO-2022-JP', 'UTF-8', 'EUC-JP', 'SJIS'], true);
if ( ! $from_encoding)
{
    $from_encoding = 'SJIS';
}
@$dom_document->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', $from_encoding));
$xml_object = simplexml_import_dom($dom_document);

$og_title_xpath = $xml_object->xpath('//meta[@property="og:title"]/@content');
$title_xpath = $xml_object->xpath('//title');

if ( ! empty($og_title_xpath))
{
    $title = (string)$og_title_xpath[0];
}

if ($title === '')
{
    $title = (string)$title_xpath[0];
}

PHPでスクレイピング
PHPでの無難なエンコーディング検出方法

2
8
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
8

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?