More than 5 years have passed since last update.

yahooニュースのタイトルを取得してセンチメント分析

Last updated at 2020-05-17Posted at 2020-05-11

yahooニュースのコンテンツを取得

コンテンツの取得については自己責任でお願いします。

.py

from tqdm import tqdm
import urllib.parse
import time

jp_keyword = ''#検索キーワードを入力
page_num=int()#()の中にトータルページ数を入力

# 日本語をURLエンコーディングする
keyword = urllib.parse.quote(jp_keyword)

# 題名リスト
title_list=[]
# 投稿日リスト
date_list=[]
for i in tqdm(range(1,page_num-1)):
    
    url="https://news.yahoo.co.jp/search/?p="+keyword+"&st=n&ei=UTF-8&b="+str(i)+"1"
    print(url)
    res = requests.get(url)
    #サーバーに負荷をかけないように間を開ける
    time.sleep(2)
    # レスポンスの HTML から BeautifulSoup オブジェクトを作る
    soup = BeautifulSoup(res.content, 'html.parser')

    # title タグの文字列を取得する
    title_text = soup.find_all('h2')
    for x in title_text:
        title_list.append(x.text)
    
    date_text=soup.find_all('span', class_="d")
    for x in date_text:
        date_list.append(x.text)

センチメント分析を行う関数を作成

APIの取得については以下のサイトを読めばわかります。

クイックスタート: Natural Language API の設定
 API キーの使用

APIの使用方法

.py

key=""#APIkeyを入力
# APIのURL
url = 'https://language.googleapis.com/v1/documents:analyzeSentiment?key=' + key

def sentimental(text):
    header = {'Content-Type': 'application/json'}
    body = {
        "document": {
            "type": "PLAIN_TEXT",
            "language": "JA",#言語を指定
            "content": text
        },
        "encodingType": "UTF8"
    }

    #json形式で結果を受け取る。
    response = requests.post(url, headers=header, json=body).json()
    #スコアを返す
    return response["documentSentiment"]["score"]

scoreリストにスコアを入れる

.py

score_list=[]
for word in tqdm(wordlist):
    score_list.append(sentimental(word))

.py

import pandas as pd
df = pd.DataFrame()
df["word"]=title_list
df["date"]=date_list
df["score"]=score_list

データフレームをpickleファイルに保存

.py

import pickle
with open('sentimental_df.pickle', 'wb') as web:
    pickle.dump(df , web)

データを取り出す

.py

import pickle
with open('sentimental_df.pickle', 'rb') as web:
    df = pickle.load(web)
    print (df)

参考

API キーの使用
 PythonでGoogle Natural Language APIを叩いて感情分析
 Natural Language
開発効率をあげる！Pythonでpickleを使う方法【初心者向け】
PythonでURLエンコード・デコード（urllib.parse.quote, unquote）

クイックスタート: Natural Language API の設定

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up