More than 5 years have passed since last update.

phpでkindleの情報を収集する

Posted at 2019-03-22

リンクの取得

  <?php
    require_once("./phpQuery-onefile.php");
    $html = $header = 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36';
    $options = array('http'=> array('header' => $header));
    $ctx = stream_context_create($options);
    $html = file_get_contents("https://www.amazon.co.jp/s/ref=sr_nr_n_0?fst=as%3Aoff&rh=n%3A2250738051%2Cp_36%3A20000-40000%2Cn%3A%212250739051%2Cn%3A2275256051%2Cn%3A2293143051%2Cn%3A2430812051&bbn=2293143051&ie=UTF8&qid=1552806492&rnid=2293143051",false,$ctx);


    #ここでhtml全文を取得している
    $phpobj = phpQuery::newDocument($html);

    #hタグを取得
    #複数のクラスがあるなら、すべて記述しないと取ってこれない
    $headers = $phpobj["li[id^='result_'] a[class='a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal']"];

    #attr("href")でリンクを取得
    #attr("title")でタイトルを取得
    foreach ($headers as $header) {
      echo pq($header)->attr("href")."<br>";
      echo "-------------------------------<br>";
    }

  ?>

id^='result_'は先頭行を一致させている。result_0など後に数字が着ていれば、データを取得する。正規表現みたいなもの。
kindleのaタグにはいろいろクラスがついているが、一つだけを指定しただけじゃ情報は取得できない。つけられているクラスの情報をすべて指定することで、全部取ってこれる。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up