1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

AI WebAgentツール BrowserUseを試してみた

Last updated at Posted at 2025-01-13

概要

browser-useとは?

LLMとWebAgent

大規模言語モデルを使って、実世界(ブラウザでアクセスできる世界)を操作する機能の研究が進んでいます。2024秋時点では、WebAgentとかAI-Agentと呼称しているようです1

[研究の目的]人間のWeb操作を代行

どの研究も人がパソコンに話しかけて、それをパソコンが理解してWeb操作を代行することが目標です。クルマに例えると「Web操作の自動運転」ですかね。

browser-use

こちらはそうしたAIAgent実装の一つで、リポジトリとともにMITラインセンスで提供されています。

Github リポジトリ

https://github.com/browser-use/browser-use
スクリーンショット 2025-01-13 13.48.17.png

どんな動作をしているか?

AI Agentとして

  • 指定されたサイトで何をすれば良いかを判断し、操作手順を生成
  • 操作手順から
    • 対象サイトの画面を分析
    • Playwrightコードを生成し、操作方法を具体化
    • 操作に失敗した時はコードを生成し直して再挑戦
      • それでも失敗する場合は、操作手順を見直し
    • 結果が得られたら適宜Markdownやスクリーンショットなどに出力する

OpenAI以外のLLM利用

OpenAI以外のLLMの利用も容易です。リポジトリexamplesディレクトリにはQwenやollama、geminiなどへのアクセス方法が開示されています。

ブラウザ操作はPlaywrightを利用

Playwright(Microsoft提供OSS)を使っています

類似の研究

Minecraftを自動プレイするVoyagerに解説があるVoyager(2023)が近い印象です。

image.png

操作してみる

準備

ディレクトリ準備

適当なディレクトリを作成し、ターミナルで移動しておきます

インストール

# Python環境作成 (venvじゃなくても🆗です)
$ python3 -m venv venv

# BrowserUseをインストール
$ pip install browser-use 
$ playwright install

OpenAI-APIのクレデンシャル(API-Key)を保存

  • .envファイルにOpenAI-APIのクレデンシャル(API-Key)を保存しておきます
  • browser-useへのオプションもここで設定
    スクリーンショット 2025-01-13 13.14.07.png

実行スクリプトを準備

リポジトリのREADME.mdからデモスクリプトをファイル保存
スクリーンショット 2025-01-13 13.12.09.png

実行

$ python3 ./00-helloworld.py

INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://github.com/gregpr07/browser-use for more information.
INFO     [agent] 🚀 Starting task: Go to Reddit, search for 'browser-use' in the search bar, click on the first post and return the first comment.
INFO     [agent] 
📍 Step 1
INFO     [agent] 👍 Eval: Success - The task has started on a blank page and I need to navigate to Reddit.
INFO     [agent] 🧠 Memory: Begin task on Reddit.
INFO     [agent] 🎯 Next goal: Navigate to Reddit's homepage and perform a search for 'browser-use'.
INFO     [agent] 🛠️  Action 1/1: {"search_google":{"query":"Reddit website homepage"},"go_to_url":{"url":"https://www.reddit.com/"},"go_back":{},"click_element":{"index":1,"xpath":null},"input_text":{"index":5,"text":"browser use","xpath":null},"switch_tab":{"page_id":0},"open_tab":{"url":"https://www.reddit.com/"},"extract_content":{"value":"text"},"done":{"text":"Extract first comment from the first Reddit post related to 'browser-use'. \nFirst comment: [Extracted text] "},"scroll_down":{"amount":50},"scroll_up":{"amount":null},"send_keys":{"keys":"Enter"},"scroll_to_text":{"text":"Comments"},"get_dropdown_options":{"index":0},"select_dropdown_option":{"index":0,"text":"browser use"}}
INFO     [controller] 🔍  Searched for "Reddit website homepage" in Google
INFO     [agent] 
📍 Step 2
INFO     [agent] ⚠ Eval: Failed - I mistakenly searched Google for Reddit instead of navigating to the Reddit homepage directly.
INFO     [agent] 🧠 Memory: The task is to find and navigate to Reddit's homepage.
INFO     [agent] 🎯 Next goal: Click on the Reddit homepage link to access the website.
INFO     [agent] 🛠️  Action 1/1: {"search_google":null,"go_to_url":null,"go_back":null,"click_element":{"index":18,"xpath":null},"input_text":null,"switch_tab":null,"open_tab":null,"extract_content":null,"done":null,"scroll_down":null,"scroll_up":null,"send_keys":null,"scroll_to_text":null,"get_dropdown_options":null,"select_dropdown_option":null}
INFO     [controller] 🖱️  Clicked index 18
INFO     [agent] 
📍 Step 3
INFO     [agent] 👍 Eval: Success - Navigated to Reddit's homepage successfully.
INFO     [agent] 🧠 Memory: Currently on Reddit. Need to search for 'browser-use'.
INFO     [agent] 🎯 Next goal: Input 'browser-use' into the search bar on Reddit and initiate the search.
INFO     [agent] 🛠️  Action 1/1: {"search_google":null,"go_to_url":null,"go_back":null,"click_element":null,"input_text":{"index":2,"text":"browser-use","xpath":null},"switch_tab":null,"open_tab":null,"extract_content":null,"done":null,"scroll_down":null,"scroll_up":null,"send_keys":{"keys":"Enter"},"scroll_to_text":null,"get_dropdown_options":null,"select_dropdown_option":null}
INFO     [controller] ⌨️  Input "browser-use" into index 2
INFO     [agent] 
📍 Step 4
INFO     [agent] 👍 Eval: Success - Input was entered into the search bar for 'browser-use'.
INFO     [agent] 🧠 Memory: Search for 'browser-use' initiated on Reddit.
INFO     [agent] 🎯 Next goal: Click on the option to search for 'browser-use'.
INFO     [agent] 🛠️  Action 1/1: {"search_google":null,"go_to_url":null,"go_back":null,"click_element":{"index":15,"xpath":null},"input_text":null,"switch_tab":null,"open_tab":null,"extract_content":null,"done":null,"scroll_down":null,"scroll_up":null,"send_keys":null,"scroll_to_text":null,"get_dropdown_options":null,"select_dropdown_option":null}
INFO     [controller] 🖱️  Clicked index 15
INFO     [agent] 
📍 Step 5
INFO     [agent] 👍 Eval: Success - Search results for 'browser-use' are displayed.
INFO     [agent] 🧠 Memory: Search results loaded. Next step is to click on the first post.
INFO     [agent] 🎯 Next goal: Click on the first post in the search results to view its details.
INFO     [agent] 🛠️  Action 1/1: {"search_google":null,"go_to_url":null,"go_back":null,"click_element":{"index":10,"xpath":null},"input_text":null,"switch_tab":null,"open_tab":null,"extract_content":null,"done":null,"scroll_down":null,"scroll_up":null,"send_keys":null,"scroll_to_text":null,"get_dropdown_options":null,"select_dropdown_option":null}
WARNING  [controller] Element no longer available with index 10 - most likely the page changed
INFO     [agent] 
📍 Step 6
INFO     [agent] ⚠ Eval: Failed - Attempted to click on an incorrect element. First post was not selected.
INFO     [agent] 🧠 Memory: Need to correctly select the first post related to 'browser-use'.
INFO     [agent] 🎯 Next goal: Click the first post titled 'Browser Use' under 'r/LocalLLaMA'.
INFO     [agent] 🛠️  Action 1/1: {"search_google":null,"go_to_url":null,"go_back":null,"click_element":{"index":12,"xpath":null},"input_text":null,"switch_tab":null,"open_tab":null,"extract_content":null,"done":null,"scroll_down":null,"scroll_up":null,"send_keys":null,"scroll_to_text":null,"get_dropdown_options":null,"select_dropdown_option":null}
INFO     [controller] 🖱️  Clicked index 12
INFO     [agent] 
📍 Step 7
INFO     [agent] 👍 Eval: Success - Successfully navigated to the post page 'Browser Use'.
INFO     [agent] 🧠 Memory: Post details accessed. Ready to view comments.
INFO     [agent] 🎯 Next goal: Scroll to and click the 'Go to comments' button to view the comments.
INFO     [agent] 🛠️  Action 1/1: {"search_google":null,"go_to_url":null,"go_back":null,"click_element":{"index":11,"xpath":null},"input_text":null,"switch_tab":null,"open_tab":null,"extract_content":null,"done":null,"scroll_down":null,"scroll_up":null,"send_keys":null,"scroll_to_text":null,"get_dropdown_options":null,"select_dropdown_option":null}
INFO     [controller] 🖱️  Clicked index 11
INFO     [agent] 
📍 Step 8
INFO     [agent] 👍 Eval: Success - First comment is visible on the post page.
INFO     [agent] 🧠 Memory: Accessed the first comment of the post.
INFO     [agent] 🎯 Next goal: Retrieve the first comment.
INFO     [agent] 🛠️  Action 1/1: {"search_google":null,"go_to_url":null,"go_back":null,"click_element":null,"input_text":null,"switch_tab":null,"open_tab":null,"extract_content":null,"done":{"text":"The first comment under the post:\n\n\"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required\"\n\nComment by: grigio"},"scroll_down":null,"scroll_up":null,"send_keys":null,"scroll_to_text":null,"get_dropdown_options":null,"select_dropdown_option":null}
INFO     [agent] 📄 Result: The first comment under the post:

"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required"

Comment by: grigio
INFO     [agent] ✅ Task completed successfully
INFO     [agent] Created GIF at agent_history.gif
AgentHistoryList(all_results=[ActionResult(is_done=False, extracted_content='🔍  Searched for "Reddit website homepage" in Google', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='🖱️  Clicked index 18', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='⌨️  Input "browser-use" into index 2', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='🖱️  Clicked index 15', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content=None, error='Failed to click element: <img src="https://b.thumbs.redditmedia.com/5CjwrCLiYs0_bivakTc5BmgQWT-J6-x0aaJSB99IPEc.png" srcset="" sizes="" alt="" browser-user-highlight-id="playwright-highlight-10"> [interactive, top, highlight:10]. Error: Element: <img src="https://b.thumbs.redditmedia.com/5CjwrCLiYs0_bivakTc5BmgQWT-J6-x0aaJSB99IPEc.png" srcset="" sizes="" alt="" browser-user-highlight-id="playwright-highlight-10"> [interactive, top, highlight:10] not found', include_in_memory=False), ActionResult(is_done=False, extracted_content='🖱️  Clicked index 12', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='🖱️  Clicked index 11', error=None, include_in_memory=True), ActionResult(is_done=True, extracted_content='The first comment under the post:\n\n"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required"\n\nComment by: grigio', error=None, include_in_memory=False)], all_model_outputs=[{'search_google': {'query': 'Reddit website homepage'}, 'go_to_url': {'url': 'https://www.reddit.com/'}, 'go_back': {}, 'click_element': {'index': 1}, 'input_text': {'index': 5, 'text': 'browser use'}, 'switch_tab': {'page_id': 0}, 'open_tab': {'url': 'https://www.reddit.com/'}, 'extract_content': {'value': 'text'}, 'done': {'text': "Extract first comment from the first Reddit post related to 'browser-use'. \nFirst comment: [Extracted text] "}, 'scroll_down': {'amount': 50}, 'scroll_up': {}, 'send_keys': {'keys': 'Enter'}, 'scroll_to_text': {'text': 'Comments'}, 'get_dropdown_options': {'index': 0}, 'select_dropdown_option': {'index': 0, 'text': 'browser use'}}, {'click_element': {'index': 18}}, {'input_text': {'index': 2, 'text': 'browser-use'}, 'send_keys': {'keys': 'Enter'}}, {'click_element': {'index': 15}}, {'click_element': {'index': 10}}, {'click_element': {'index': 12}}, {'click_element': {'index': 11}}, {'done': {'text': 'The first comment under the post:\n\n"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required"\n\nComment by: grigio'}}])

$

画面操作ログ

デフォルトでは実行ディレクトリに画面操作ログをagent-history.gifに保存しています

agent_history.gif

最後に

さらっと動作だけ紹介しました。
なかなか興味深い取り組みですので、深掘りして新しい情報を投稿していきたいと思います

参考リンク

  1. 某メーカがWebAgentを使った商標を取得しており今後変わるかも...

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?