Are you sure you want to delete the question?

Leaving a resolved question undeleted may help others!

Pythonスクレイピングでscriptのtext化ができないです。

解決したいこと

こちらのhttps://zukan.pokemon.co.jp/detail/001/ の script id="json-data" type="application/json"の内容をtext化させたい。

コード

pokemon.py
from bs4 import BeautifulSoup as bs
import pandas as pd
import time,datetime
import requests
import re
from fake_useragent import UserAgent


ua = UserAgent()
UserAgent = ua.random


def get_html(url):
    headers = {"UserAgent": UserAgent}
    res = requests.get(url_list[0], headers=headers)
    soup = bs(res.content, "html.parser")
    return soup


number_list = ['{:03}'.format(i) for i in range(1, 891)]  

url_list = ["https://zukan.pokemon.co.jp/detail/{0}".format(number) for number in number_list]

soup = get_html(url_list[0])

title_part = soup.find_all("script", {"type": "application/json"})
print(title_part) #textにする前は取れて

for i in title_part:
    title = i.get_text()
    print(title) #textにすると取れない

問題内容

ターミナルで確認すると

ターミナル.
[<script id="json-data" type="application/json">
  {"pokemon":{"no":"001","sub":0,"name":"フシギダネ","sub_name":"","area":1,"omosa":"6.9","takasa":"0.7","sex":1,"bunrui":"たねポケモン","tokusei_1":82,"tokusei_2":0,"type_1":4,"type_2":8,"text_1":"うまれたときから せなかに しょくぶつの タネが あって すこしずつ おおきく そだつ。 (『ポケモン ソード』より)","text_2":"うまれて しばらくの あいだ せなかの タネに つまった えいようを とって そだつ。 (『ポケモン シールド』より)","spec_hp":3,"spec_kougeki":3,"spec_bougyo":3,"spec_tokukou":4,"spec_tokubou":4,"spec_subayasa":3,"sugata_text_flg":0,"sugata_text":"","mega_flg":0,"genshi_flg":0,"kyodai_flg":0,"image_l":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/5e1db695135dd89787cfe0927d08211c.jpg","image_m":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/7b705082db2e24dd4ba25166dac84e0a.png","image_s":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/afa02eaba4c39820fc57f4e8abaeea80.png","zukan_no":"001"},"abilities":[{"id":82,"name":"しんりょく","exp":"HPが へったとき くさタイプの わざの いりょくが あがる。"}],"evolutionsType":"1-1-1","evolutions":{"pokemon":{"no":"001","sub":0,"name":"フシギダネ","sub_name":"","omosa":"6.9","takasa":"0.7","type_1":4,"type_2":8,"kyodai_flg":0,"image_m":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/7b705082db2e24dd4ba25166dac84e0a.png","image_s":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/afa02eaba4c39820fc57f4e8abaeea80.png","zukan_no":"001"},"children":[{"pokemon":{"no":"002","sub":0,"name":"フシギソウ","sub_name":"","omosa":"13.0","takasa":"1.0","type_1":4,"type_2":8,"kyodai_flg":0,"image_m":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/6f8144eb4659537733b930d6a299d5a7.png","image_s":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/f78edf4c2bc037f4b23529edfcf9ddce.png","zukan_no":"002"},"children":[{"pokemon":{"no":"003","sub":0,"name":"フシギバナ","sub_name":"","omosa":"100.0","takasa":"2.0","type_1":4,"type_2":8,"kyodai_flg":0,"image_m":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/ebccfe6f2ccfe2e851fd29739bf6220c.png","image_s":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/cdce516974ae6a74e1b8b855644c5ce5.png","zukan_no":"003"},"children":[]}]}]},"groups":[],"nav":{"prev":[],"next":{"no":"002","sub":0,"name":"フシギソウ","sub_name":"","omosa":"13.0","takasa":"1.0","type_1":4,"type_2":8,"kyodai_flg":0,"image_m":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/6f8144eb4659537733b930d6a299d5a7.png","image_s":"https:\/\/zukan.pokemon.co.jp\/zukan-api\/up\/images\/index\/f78edf4c2bc037f4b23529edfcf9ddce.png","zukan_no":"002"}},"news":[{"uniq":"https:\/\/www.pokemon.co.jp\/info\/2020\/06\/200605_gd01.html","title":"「ポケモンシャツ」からポロシャツが新発売!よりカジュアルに「ポケモンシャツ」を着こなそう!","body_linkwin":"0","term":"goods","img_eyecatch":"https:\/\/www.pokemon.co.jp\/PostImages\/81f61fc8ed2ee944d2738e46f4dec77e16fd552e.jpg","pickup":"0","new":0,"start_date":"2020.06.05"},{"uniq":"https:\/\/www.pokemon.co.jp\/info\/2020\/02\/200221_p01.html","title":"『ポケモン不思議のダンジョン 救助隊DX』のグッズが、全国のポケモンセンターに登場!","body_linkwin":"0","term":"pokecen","img_eyecatch":"https:\/\/www.pokemon.co.jp\/PostImages\/59f2a8dee46eda19f0adbd8e9e9d22587773f3a0.jpg","pickup":"0","new":0,"start_date":"2020.02.21"},{"uniq":"https:\/\/www.pokemon.co.jp\/info\/2020\/01\/200131_gd01.html","title":"『幼稚園』3月号に、「Nintendo Labo」と『ポケモンクエスト』のコラボ付録が登場!","body_linkwin":"0","term":"goods","img_eyecatch":"https:\/\/www.pokemon.co.jp\/PostImages\/04db454f23372b7e1e78165e383eabcd980db038.jpg","pickup":"0","new":0,"start_date":"2020.01.31"},{"uniq":"https:\/\/www.pokemon.co.jp\/info\/2019\/08\/190816_p02.html","title":"ハロウィンのパレードを楽しむ、ピカチュウやルカリオたちのグッズが、ポケモンセンターに登場!","body_linkwin":"0","term":"pokecen","img_eyecatch":"https:\/\/www.pokemon.co.jp\/PostImages\/818a95189e2e3985973782e24d141bd983c3be53.jpg","pickup":"0","new":0,"start_date":"2019.08.16"}]}</script>]

とprint(title)の出力が空欄になってしまいます。
取れない原因や対処方法を教えていただけましたら幸いです。
よろしくお願いいたします。

調べてきた記事

-https://www.inet-solutions.jp/technology/beautiful-soup-python/
-https://tanuhack.com/scraping-bs4/
-https://qiita.com/poorko/items/9140c75415d748633a10
-https://qiita.com/Chanmoro/items/db51658b073acddea4ac
-https://www.youtube.com/watch?v=E4MjFqkxy8k
など

1

1Answer

私の環境の古いバージョンのBeautifuil Soup(4.7.1)ならそれでも動きましたが、4.9.0から仕様変更が入っています。お使いのバージョンはいくつですか?(pip listとかで確認)

As of Beautiful Soup version 4.9.0, when lxml or html.parser are in use, 
the contents of <script>, <style>, and <template> tags are not considered to be ‘text’,
since those tags are not part of the human-visible content of the page.

つまり<script>の中身はテキストではないということになったので、別の方法で取得してね、ということみたいです。bs4のバージョンを落とすか、あるいはcontentsとか使えばいいと思います。

# title = i.get_text()
title = i.contents[0].strip()
1Like

Comments

  1. @or1os24

    Questioner

    バージョンが4.9.1だったのでできました!!
    ありがとうございます!

Your answer might help someone💌