データ収集の悩みを一気に解決！Bright Dataの次世代Webスクレイピングにチャレンジ

ECの価格は本当に“平等”なのか？Bright Dataで国別価格を時系列で観測してみた（Geo × 差分分析）

Posted at 2026-02-07

はじめに：ECの価格は本当に“平等”なのか？

同じ商品でも、アクセスしている場所（国/地域）によって、

表示価格（通貨含む）
税の含まれ方
配送可否
在庫表示

が違うことってありませんか？

ただ、これを「検証」しようとすると観測の土台が揺らぎがちです。

VPNだと国が安定しない／弾かれることがある
欠損が混ざると時系列比較が成立しない
そもそもアクセスがブロックされる

観測が不安定だと、差分分析は“差分”ではなく“ノイズ”になります。
そこで今回は、Geoを固定して安定取得できる仕組みで「検証可能な観測」を作ります。

なぜBright Dataが効くのか（VPNとの差）

ポイントは大きく2つです。

指定した国のIPからアクセスした“体”でHTMLを取れる（Geo固定）
取得が安定すると、差分分析（比較）と時系列分析（変化）ができる

今回使うのは、Bright Dataの /request エンドポイント。
zone, url, format, country を渡して、国別に同一ページを定点観測します。

Bright Dataが不要なケース／効くケース（使い分けの指針）

Webからデータを取るときは、まず「公式APIやフィードが用意されているか」を確認するのが基本です。
たとえば Qiita やWikipedia のように、取得手段が整っている “開かれたサイト” または ”APIが存在するサイト” は、Bright Dataを使わなくても十分にデータを取得できます（むしろ公式手段のほうが速く・安く・安定します）。

一方で実務で困りやすいのは、アクセス条件や地域によって表示が変わる、または取得が不安定になりやすい領域です。
Bright Dataは、そうしたケースで「取得条件（国・地域など）を固定し、欠損を減らし、時系列比較を成立させる」ための基盤として効きます。

代表例は次のような領域です ※取得対象の利用規約・robots・法令・個人情報保護の範囲で実施する前提

EC（価格・在庫・レビュー）：Amazon、楽天市場など
→ 地域・タイミングで表示が変わりやすく、「同条件での継続観測」が課題になりやすい
求人（件数が多い・変化が速い）：Indeed、LinkedIn など
→ 定期収集を前提にすると、欠損や条件ブレの影響が大きい
不動産・旅行（地域依存データ）：SUUMO、Airbnb など
→ Geo固定がないと「比較」自体が成立しにくい
SNS / メディア（表示条件の影響が大きい）：X、Instagram など
→ 取得条件が揺れると時系列が壊れやすい（観測目的なら条件固定が重要）

本記事のテーマである Geo × 時系列 × 差分分析 は、「同じ条件で取り続けられること（再現性）」が前提です。
その前提を作るために、/request で国別に定点観測できる構成にしています。

API Keyの発行方法

以下ページからBright Dataにアクセスします。

［Web Access］画面で［APIを作成］ボタンをクリックします。
［ウェブ解除API］を選択して［続行］ボタンをクリックします。
　名前（例：web_unlocker）を入力して［APIを追加］ボタンをクリックします。
［API設定を開く］ボタンでZone作成完了します。

この画面で Zone名（例：web_unlocker）と curl サンプルが確認できます。
左側に APIキーが表示されます。この記事のコードでは Zone名をそのまま使います。

国別スナップショット取得（Python）

1. 事前準備

Zone（例：web_unlocker）を作成
API Keyを発行

2. フォルダ構成

brightdata-apple-geo/
  ├─ .env
  ├─ requirements.txt
  └─ run.py

3. `.env`の作成

.env に事前準備で作成したAPIキーとZone名を設定します。

.env

# .env（例）
BRIGHTDATA_API_KEY="xxxxxxxxxxxxxxxxxxxx"
BRIGHTDATA_ZONE="web_unlocker"

4. `requirements.txt` の作成

requirements.txt を作成します。

requirements.txt

python-dotenv
requests
beautifulsoup4
lxml

5. `run.py` の作成

「同一商品ページを国別に観測」するために、次の設計にしています。

URLは国別に用意（同一商品でもローカライズURLがあるため）
country と Accept-Language を揃えてブレを減らす
取得失敗時はリトライして欠損を減らす（時系列の前提）

対象は、ログイン不要で閲覧できる公開ページだけにしています。

run.py

# 本記事では、例として国別のローカライズURL（同一製品）を取得して比較しています。
URLS = {
    "JP": "https://www.apple.com/jp/shop/buy-iphone/iphone-17-pro",
    "US": "https://www.apple.com/us/shop/buy-iphone/iphone-17-pro",
    "DE": "https://www.apple.com/de/shop/buy-iphone/iphone-17-pro",
}

run.py（全文）

クリックして開く

run.py

import os
import csv
import json
import time
import re
import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from datetime import datetime, timezone

BD_ENDPOINT = "https://api.brightdata.com/request"

# 例：国別のローカライズURL（同一製品を想定）
URLS = {
    "JP": "https://www.apple.com/jp/shop/buy-iphone/iphone-17-pro",
    "US": "https://www.apple.com/us/shop/buy-iphone/iphone-17-pro",
    "CA": "https://www.apple.com/ca/shop/buy-iphone/iphone-17-pro",
}
COUNTRIES = ["JP", "US", "CA"]

OUT_CSV = "apple_price_snapshot.csv"
DEBUG_SAVE_HTML = True
MAX_RETRIES = 2

USDJPY = 150.0
CADJPY = 110.0

def headers_for_country(country: str) -> dict:
    base = {
        "Content-Type": "application/json",
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/121.0.0.0 Safari/537.36"
        ),
    }
    if country == "JP":
        base["Accept-Language"] = "ja-JP,ja;q=0.9,en;q=0.8"
    elif country == "CA":
        base["Accept-Language"] = "en-CA,en;q=0.9,fr-CA;q=0.8"
    else:
        base["Accept-Language"] = "en-US,en;q=0.9"
    return base

def _extract_html_from_bd_json(text: str) -> str:
    try:
        data = json.loads(text)
    except Exception:
        return ""
    if isinstance(data, dict):
        for key in ("body", "html", "content"):
            v = data.get(key)
            if isinstance(v, str) and v.strip():
                return v
        resp = data.get("response")
        if isinstance(resp, dict):
            for key in ("body", "html", "content"):
                v = resp.get(key)
                if isinstance(v, str) and v.strip():
                    return v
    stack = [data]
    while stack:
        obj = stack.pop()
        if isinstance(obj, dict):
            for k, v in obj.items():
                if k in ("body", "html", "content") and isinstance(v, str) and v.strip():
                    return v
                if isinstance(v, (dict, list)):
                    stack.append(v)
        elif isinstance(obj, list):
            for v in obj:
                if isinstance(v, (dict, list)):
                    stack.append(v)
    return ""

def _bd_request(api_key: str, zone: str, url: str, country: str, fmt: str) -> requests.Response:
    return requests.post(
        BD_ENDPOINT,
        headers={
            "Authorization": f"Bearer {api_key}",
            **headers_for_country(country),
        },
        json={
            "zone": zone,
            "url": url,
            "format": fmt,
            "country": country,
            "method": "GET",
        },
        timeout=120,
    )

def fetch_html(country: str, api_key: str, zone: str) -> str:
    url = URLS[country]
    for attempt in range(1, MAX_RETRIES + 2):
        r = _bd_request(api_key, zone, url, country, fmt="raw")
        if r.status_code >= 400:
            print(f"[{country}] STATUS(raw):", r.status_code)
            print(f"[{country}] BODY(raw head):", (r.text or "")[:500])
            r.raise_for_status()

        html = (r.text or "").strip()
        if html:
            return html

        print(f"[{country}] WARNING: empty body (attempt {attempt}) fmt=raw url={url}")
        rj = _bd_request(api_key, zone, url, country, fmt="json")

        if rj.status_code >= 400:
            head = (rj.text or "")[:220].replace("\n", " ")
            print(f"[{country}] STATUS(json):", rj.status_code)
            if head.strip():
                print(f"[{country}] json head:", head)

        html2 = _extract_html_from_bd_json(rj.text or "")
        if html2.strip():
            return html2

        time.sleep(2)

    return ""

def extract_price_currency(html: str, country: str):
    soup = BeautifulSoup(html, "lxml")

    for s in soup.select('script[type="application/ld+json"]'):
        raw = s.get_text(strip=True)
        if not raw:
            continue
        try:
            data = json.loads(raw)
        except Exception:
            continue

        stack = [data]
        while stack:
            obj = stack.pop()
            if isinstance(obj, dict):
                if "price" in obj and "priceCurrency" in obj:
                    try:
                        return float(str(obj["price"]).replace(",", "")), str(obj["priceCurrency"])
                    except Exception:
                        pass
                for v in obj.values():
                    if isinstance(v, (dict, list)):
                        stack.append(v)
            elif isinstance(obj, list):
                for v in obj:
                    if isinstance(v, (dict, list)):
                        stack.append(v)

    text = soup.get_text(" ", strip=True)

    m = re.search(r'([0-9][0-9,]*)\s*円', text)
    if m:
        return float(m.group(1).replace(",", "")), "JPY"
    m = re.search(r'(¥|￥)\s*([0-9][0-9,]*)', text)
    if m:
        return float(m.group(2).replace(",", "")), "JPY"

    m = re.search(r'\$\s*([0-9][0-9,\.]*)\s*CAD', text)
    if m:
        return float(m.group(1).replace(",", "")), "CAD"

    m = re.search(r'\$\s*([0-9][0-9,\.]*)', text)
    if m:
        price = float(m.group(1).replace(",", ""))
        return (price, "CAD") if country == "CA" else (price, "USD")

    return None, None

def append_rows_to_csv(path: str, rows: list[dict]) -> None:
    fields = ["取得時刻_UTC", "国", "URL", "価格", "通貨"]
    exists = os.path.exists(path)
    with open(path, "a", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        if not exists:
            writer.writeheader()
        for r in rows:
            writer.writerow({
                "取得時刻_UTC": r["timestamp_utc"],
                "国": r["country"],
                "URL": r["url"],
                "価格": r["price"],
                "通貨": r["currency"],
            })

def to_jpy(price, currency):
    if price is None:
        return None
    if currency == "JPY":
        return price
    if currency == "USD":
        return price * USDJPY
    if currency == "CAD":
        return price * CADJPY
    return None

def save_analysis(rows):
    out = "apple_price_analysis.csv"

    jp_jpy = None
    for r in rows:
        if r["country"] == "JP":
            jp_jpy = to_jpy(r["price"], r["currency"])
            break

    fields = [
        "取得時刻_UTC", "国", "価格", "通貨",
        "円換算",
        "日本比_方向", "日本比_差額_円", "日本比_差率_%"
    ]

    with open(out, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        writer.writeheader()

        for r in rows:
            pj = to_jpy(r["price"], r["currency"])
            if pj is None or jp_jpy is None or jp_jpy == 0:
                direction = ""
                diff_abs = None
                pct_abs = None
            else:
                diff = pj - jp_jpy
                if diff > 0:
                    direction = "高い"
                elif diff < 0:
                    direction = "安い"
                else:
                    direction = "同じ"

                diff_abs = abs(diff)
                pct_abs = abs(diff / jp_jpy * 100)

            writer.writerow({
                "取得時刻_UTC": r["timestamp_utc"],
                "国": r["country"],
                "価格": r["price"],
                "通貨": r["currency"],
                "円換算": round(pj, 2) if pj is not None else None,
                "日本比_方向": direction,
                "日本比_差額_円": round(diff_abs, 2) if diff_abs is not None else None,
                "日本比_差率_%": round(pct_abs, 2) if pct_abs is not None else None,
            })

    print(f"✅ saved: {out}")

def main():
    load_dotenv()
    api_key = os.environ.get("BRIGHTDATA_API_KEY")
    zone = os.environ.get("BRIGHTDATA_ZONE", "web_unlocker")
    if not api_key:
        raise RuntimeError("BRIGHTDATA_API_KEY が .env に設定されていません")

    now = datetime.now(timezone.utc).isoformat()
    rows = []

    print("ZONE:", zone)
    print("COUNTRIES:", COUNTRIES)
    print("URLS:", URLS)

    for country in COUNTRIES:
        url = URLS[country]
        print(f"\n--- fetching country={country} url={url} ---")

        html = fetch_html(country, api_key, zone)

        if DEBUG_SAVE_HTML:
            with open(f"debug_{country}.html", "w", encoding="utf-8") as f:
                f.write(html or "")

        if not (html or "").strip():
            print(f"[{country}] ERROR: empty HTML.")
            rows.append({
                "timestamp_utc": now,
                "country": country,
                "url": url,
                "price": None,
                "currency": None,
            })
            continue

        price, currency = extract_price_currency(html, country)
        row = {
            "timestamp_utc": now,
            "country": country,
            "url": url,
            "price": price,
            "currency": currency,
        }
        print("result:", row)
        rows.append(row)

        time.sleep(1)

    append_rows_to_csv("apple_price_snapshot.csv", rows)
    print(f"\n✅ saved: apple_price_snapshot.csv ({len(rows)} rows)")

    save_analysis(rows)

if __name__ == "__main__":
    main()

今回は、Apple公式ストアの公開商品ページを対象に、同一URLであってもアクセス元の国によって表示価格・通貨がどのように変わるかを検証します。

6. 実行方法

bash

cd ~/brightdata-apple-geo
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python run.py

成功すると、同じフォルダに次の2つが出ます。

apple_price_snapshot.csv（観測ログ：国別スナップショット）
apple_price_analysis.csv（差分分析：日本比の差額/差率）

7. 結果

出力例：観測ログ（スナップショットCSV）

apple_price_snapshot.csv は「いつ・どの国で・いくらが出たか」を残すログです。
時系列で追いかけたいので、**追記形式（append）**にしています。

差分分析：日本比で“実質価格差”を見る

apple_price_snapshot.csv では、JPをベースラインとして下記を出します。

円換算
日本より高い/安い
差額（円）
差率（%）

注意点（検証としてのガードレール）

ログイン不要で誰でも閲覧できる公開ページだけを対象にしています
過度なアクセスを避け、検証目的で低頻度に取得しています
取得結果の解釈は「税（込み/別）」「地域課税」「送料」「在庫・販売条件」などの影響を受けます

まとめ：Bright Dataを使う意味は「価格取得」ではなく「検証可能な観測」

今回やりたかったのは、単に価格を取得することではなく、同じ条件で繰り返し観測できる土台を作ることです。
観測条件が揃うと、はじめて差分分析や時系列比較が「検証」として成立します。

国を固定して観測できる（Geo）
取得が安定すると時系列比較ができる
差分が出ると「なぜ？」の考察ができる
さらに運用すれば「監視」や「異常検知」にも繋がる

スクレイピングが単発取得に留まりやすいのに対し、観測基盤を整えることで、再現性のあるデータを継続的に蓄積でき、仮説→検証→運用へつなげられました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up