@hiroki2003posted at 2022-08-16

find element(find element by.link_text) を使用してHTMLの要素を取得したい

Q&A

解決したいこと

例）
pythonを使用してスクレイピングをしています。HTML文の中の要素を取得するために、find element by.link_textを使用していますが、うまくいきません。
解決方法を教えて下さい。

発生している問題・エラー

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
NoSuchElementException                    Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15504/570510461.py in <module>
     29 driver.get(url)
     30 
---> 31 login = driver.find_element(By.LINK_TEXT, "ログイン")
     32 #login = driver.find_element(By.CLASS_NAME, "menutext")[1]
     33 #login = driver.find_element(By.CSS_SELECTOR, "menutext")

c:\Users\hirok\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py in find_element(self, by, value)
    854             value = '[name="%s"]' % value
    855 
--> 856         return self.execute(Command.FIND_ELEMENT, {
    857             'using': by,
    858             'value': value})['value']

c:\Users\hirok\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    432         response = self.command_executor.execute(driver_command, params)
    433         if response:
--> 434             self.error_handler.check_response(response)
    435             response['value'] = self._unwrap_value(
    436                 response.get('value', None))

c:\Users\hirok\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    241                 alert_text = value['alert'].get('text')
...
	Ordinal0 [0x00385230+1856048]
	BaseThreadInitThunk [0x75D5FA29+25]
	RtlGetAppContainerNamedObjectPath [0x76FF7A9E+286]
	RtlGetAppContainerNamedObjectPath [0x76FF7A6E+238]

該当するソースコード

import chromedriver_binary
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.alert import Alert
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

# Seleniumをあらゆる環境で起動させるChromeオプション
options = Options()
options.add_argument('--disable-gpu');
options.add_argument('--disable-extensions');
options.add_argument('--proxy-server="direct://"');
options.add_argument('--proxy-bypass-list=*');
options.add_argument('--start-maximized');
# options.add_argument('--headless'); # ※ヘッドレスモードを使用する場合、コメントアウトを外す   


# webdriverオブジェクトを作る (ブラウザが開く)
driver_path = './chromedriver.exe'
driver = webdriver.Chrome(executable_path=driver_path)

# urlを指定してブラウザで開く
url = 'https://www.yoyaku-sports.city.suginami.tokyo.jp/reselve/p_index.do'
driver.implicitly_wait(10)
driver.get(url)

login = driver.find_element(By.LINK_TEXT, "ログイン")
login.click()

HTML

<a href="#" onclick="jump(1);return false;">ログイン</a>

2 likes

2Answer

@YottyPG posted at 2022-08-16

動的に生成されるサイトのようですので、アクセス直後にはログインボタンがなくエラーになっているのかもしれません。

こちらの記事の方法で改善されないでしょうか？

1Like

Comments

@hiroki2003
Questioner
ありがとうございます。
私も待機を試してみたのですが、これでもうまくいきませんでした。

@YottyPG posted at 2022-08-16

すみません、こちらのコードで動くことが確認できました。
frameで表示されているので特殊な記述が必要でした。

driver_path = './chromedriver.exe'
driver = webdriver.Chrome(executable_path=driver_path)
url = 'https://www.yoyaku-sports.city.suginami.tokyo.jp/reselve/p_index.do'
driver.get(url)

iframe = driver.find_element(By.CSS_SELECTOR, "html > frameset > frame:nth-child(2)")
driver.switch_to.frame(iframe)
driver.switch_to.frame("content")
login = driver.find_element(By.LINK_TEXT, "ログイン")
login.click()

参考：

1Like

Comments

@hiroki2003
Questioner
ありがとうございました。
2,3日もこれが解決できず困っていたので、とても助かりました。
@hiroki2003
Questioner
YottyPGさん

By.css_selectorの引数を "html > frameset > frame:nth-child(2)"とされていますが、こちらはどのように作成されたのでしょうか？
HTMLからcssをコピーすると、以下のようになります。
'body > div > form > table > tbody > tr > td > table > tbody > tr:nth-child(2) > td > table > tbody > tr:nth-child(1) > td:nth-child(2) > input'

「nth-child(2)」が複数の箇所に存在するのですが、こちらは「input」の前の「nth-child(2)」でしょうか？（連投でご質問していまいすみません。）
@YottyPG
こちらはどの要素のCSSセレクタでしょうか？

まずはframe要素を特定してあげないといけないので、デベロッパツールなどで該当のframeタグのCSSセレクタを取得してあげると「html > frameset > frame:nth-child(2)」となりました。
このページではframesetのなかにframeタグが2つあり、このログインボタンがあるのは2つ目のframeの方なので上のようなCSSセレクタになっていますね。
@hiroki2003
Questioner
webページ上の「ログイン」を右クリックして、「開発者ツールで調査する」⇒要素を右クリック⇒コピー⇒selectorをコピーと実施しました。
そうすると、上記のような長いcssセレクタが取得できました。
@hiroki2003
Questioner
解決いたしました。
該当の「ログイン」のCSSを取得していました。
ここで取得したいのは、frameタグのcssでしたね。
同じように「html > frameset > frame:nth-child(2)」なりました。
ご丁寧にご指導ありがとうございました。

Are you sure you want to delete the question?

find element(find element by.link_text) を使用してHTMLの要素を取得したい

解決したいこと

発生している問題・エラー

該当するソースコード

2Answer

Comments

Comments

Your answer might help someone💌