PythonのWebスクレイピングにおけるエラー
Q&A
Closed
解決したいこと
Pythonを学び始めの初心者ですが、参考書に従ってWebスクレイピングをしようとするとエラーが出ます。
何から手を付けていいか分からず、解決方法をご存じの方がいればお力添えいただけると助かります!
【補足】
・『仕事がはかどるPython&Excel自動処理全部入り。』のサンプルコード
・社用PC、社内ネットワークを利用
・pip installは普通に実行すると必ず失敗するため、whlファイルを手動ダウンロード後にインストール
発生している問題・エラー
Traceback (most recent call last):
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 203, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\connection.py", line 60, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\socket.py", line 962, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 11001] getaddrinfo failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 790, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 491, in _make_request
raise new_e
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 1092, in _validate_conn
conn.connect()
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 611, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 210, in _new_conn
raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x0000023A537D1E10>: Failed to resolve 'book.impress.co.jp' ([Errno 11001] getaddrinfo failed)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='book.impress.co.jp', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000023A537D1E10>: Failed to resolve 'book.impress.co.jp' ([Errno 11001] getaddrinfo failed)"))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\12_Python\python_excel\Chapter07\booklist_get.py", line 4,
in <module>
r = requests.get('https://book.impress.co.jp/')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 73, in get
return request("get", url, params=params, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\499513\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='book.impress.co.jp', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000023A537D1E10>: Failed to resolve 'book.impress.co.jp' ([Errno 11001] getaddrinfo failed)"))
該当するソースコード
import requests
from bs4 import BeautifulSoup
r = requests.get('https://book.impress.co.jp/')
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.find('h2'))
print(soup.find('h2').text)
自分で試したこと
・コード打ち間違いがないことの確認
・見本の.pyファイルを実行しても失敗することを確認
・通信環境の確認(社用WiFiと正常に接続)
・コード中のURLを直接ブラウザに打ち込み→サイトが表示されることを確認
・別のURLに変えてコード実行してもスクレイピング失敗することを確認
・ModuleNotFoundErrorが出た場合はそのライブラリをインストール