目標
- サイト
- eBay.com
- 検索ワード
- Dragon Ball
- 検索条件
- 売り切れ
- 取得データ
- 商品名
- 状態
- 価格
- 送料
- 1 ページ目のみ
- 出力先
- eBayScraping_(日付).csv
#ソースコード
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import csv
import pandas as pd
import datetime
driver = webdriver.Chrome(executable_path="/Users/name/home/work/eBay/Python/chromedriver")
url = ('https://www.ebay.com/sch/i.html?_from=R40&_nkw=dragon+ball&_sacat=0&rt=nc&LH_Sold=1&LH_Complete=1')
driver.get(url)
driver.implicitly_wait(10)
names = []
statuses = []
prices = []
shippings=[]
items = driver.find_elements_by_class_name('s-item__info.clearfix')
for item in items:
# get Name
name = item.find_element_by_class_name('s-item__title').text
name = name.replace("NEW LISTING", "")
names.append(name)
# get Status
try:
status = item.find_element_by_class_name('SECONDARY_INFO').text
statuses.append(status)
except:
statuses.append(" ")
# get Price
price = item.find_element_by_class_name('s-item__price').text
price = price.replace("JPY ","")
prices.append(price)
# get ShippingCost
try:
shipping = item.find_element_by_class_name('s-item__logisticsCost').text
shipping = shipping.replace("+JPY ","").replace(" shipping","").replace("Free International Shipping","0")
shippings.append(shipping)
except:
shippings.append(" ")
#print(shippings)
df = pd.DataFrame()
df['name'] = names
df['status'] = statuses
df['price'] = prices
df['shippingcost'] = shippings
csv_date = datetime.datetime.today().strftime("%Y%m%d")
csv_file_name = "eBayScraping_" + csv_date + ".csv"
df.to_csv(csv_file_name, index = False)
driver.quit()
#問題点
- 以下の 2 つの項目を明確に区別できない
- Free Shipping
- Free International Shipping
- 以下の項目を取得する際、かなりの時間がかかる
- 送料
#改善・発展
- 送料の 2 つの項目について、分類方法を見直す
- 複数ページのデータを取得する
- 頻出単語を抽出する