2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

tableau publicをスクレイピングしてキーワード検索結果をView数でソートする

Posted at

パッケージのインストール

今回不要なものもありますが私がいつも使ってるセットです。

import time, sys, os, re
from time import sleep
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import requests
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import csv
from operator import methodcaller
from lxml import html
import datetime

キーワードとドライバーの設定

今回は「CRM」で検索してみたいと思います。

main_url = 'https://public.tableau.com/en-us/search/vizzes/'
kwd = 'CRM'
options = Options()
driver = webdriver.Chrome(chrome_options = options)
driver.get(main_url+kwd)

Vizの数

soup = BeautifulSoup(driver.page_source, "lxml")
tab_count = soup.find_all('span', class_='search-tab-count')
c = str(tab_count[1]).split('"search-tab-count">')[1].split('</s')[0]
print(c)
519

今回は519個のVizがあるようなので、26ページ(1ページ20個)分の検索結果画面があります。

キーワード検索の結果すべてについてのView数とURLの取得

dl = {'view': [], 'url': []}
d = pd.DataFrame(dl)

pages = list(range(1,int(c) //20 +1+1))

for j in pages:
    
    driver.get(main_url+kwd+'?page='+str(j))
    sleep(3)
    soup = BeautifulSoup(driver.page_source, "lxml")
    sru = soup.find_all('span', class_='search-result-url')
    
    for i in range(len(sru)):
        
        url_i = str(sru[i]).split('href="')[1].split('</a></span>')[0].split('">')[0]
        driver.get(url_i)
        sleep(1)
        view_i = str(BeautifulSoup(driver.page_source, "lxml").find_all('h3')[0]).split('{"COUNT": ')[1].split('}\'>views</span>')[0]
        d = d.append(pd.DataFrame({'view': [view_i], 'url': [url_i]}))
        
d['view'] = d['view'].astype(int) 
print(d.sort_values('view',ascending=False).head())
url view
0 https://public.tableau.com/profile/efields#!/v... 112839
1 https://public.tableau.com/profile/webveja#!/v... 17710
2 https://public.tableau.com/profile/webveja#!/v... 17073
3 https://public.tableau.com/profile/yerson.coll... 13388
4 https://public.tableau.com/profile/lokadata#!/... 13305

tableau publicをview数でソートして事例を探したいという方はお試しあれ!

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?