ã¯ããã«
2025幎ã¯AIãšãŒãžã§ã³ãã®å¹Žã§ãã泚ç®ãããŠããAIãšãŒãžã§ã³ãã®äžã€ããAIãèªåã§èªåã®PCç»é¢ãæäœãããBrowser UseãšããããŒã«ã§ãã
Browser Useã®é¢çœã
Browser Useã䜿ããšãAIãèªåã§èªèº«ã®PCç»é¢ãæäœããããšã§ããããã決ããç®çãéæãããŠãããŸãã
ç°¡åãªæ瀺ãåºãã ãã§ãèªåã§AIãè²ã
æäœããŠãããã®ã¯ãã£ãããŒã§è¡æçã§ãããã
äŸãã°äžèšã®ããã«å®å šèªåã§AIãèšäºãæ€çŽ¢ããŠèšäºã®æ å ±ãååŸããŠãããŸãã
ç°¡åãªæ瀺ã§AIãèªåã§èããŠç»é¢æäœãããŠãããã®ã¯è¿æªæ¥æãããŸãããã
ããããçŸå Žã§AIã䜿ãããªãã«ã¯ãAIãããããã®ã¬ãã«ã§ã¯ãŸã 足ããŸããã
å®éã«è§Šã£ãŠã¿ãŠäœãã§ããã®ãïŒéã«äœãèŠæãªã®ãïŒãšããèæèŠãæã€ããšãéåžžã«éèŠã§ãã
ããã§æ¬èšäºã¯ããã®èæèŠãé€ãããã«å®éã«Browser Useãèªåã®PCã§å®è¡ããŠã¿ãããã®ãã¥ãŒããªã¢ã«ã玹ä»ããŸãã
GitHubå
¬åŒãªããžããªã§ã¯è±èªã§è§£èª¬ãå°ãªãã£ãã®ã§ãæ¥æ¬èªã§ã®è§£èª¬ã段éãèžãã å®è¡æé ãªã©ãå«ããŠèšèŒããŠããŸãã
Pythonã«äžæ
£ãã®æ¹ããµã¯ããšè©ŠããŠã¿ããæ¹ã§ãåãçµã¿ãããæ§æã«ãªã£ãŠããŸãã
å®éã«æãåãããªããBrowser Useãšã¯äœãªã®ãïŒäœãã§ããã®ãïŒãªã©ãäœæããAIãšãŒãžã§ã³ãã®é¢çœããå ±æã§ããã°å¹žãã§ãã
察象èªè
- æ¥åå¹çåã奜ããªæ¹
- AIãããžãã¹æŽ»çšãããæ¹
- çæAIã«èå³ã®ããå šãŠã®æ¹
èšäºã®æ§æ
ïŒïœBrowser Useãšã¯ïŒ
ïŒïœBrowser Useã®ç°å¢æ§ç¯
ïŒïœäºäŸé
ïŒïœç®çå¥Tips
ïŒïœããŸã
çè ã®çŽ¹ä»ïŒãã³ã©ïŒ
æ±äº¬å€§åŠã§æ©æ¢°åŠç¿ã®ç 究ãAIã¹ã¿ãŒãã¢ããã§MLãšã³ãžãã¢ãçµéšãã³ã³ãµã«äŒç€Ÿã§æ¥åãæ
åœããåŸã«æ±å€§æŸå°Ÿç çºã®AIã¹ã¿ãŒãã¢ãããåµæ¥ã
AIæè¡ã®ç 究éçºãã倧è£ãžã®çæAI掻çšçŽ¹ä»ãç§ç«é«æ ¡å
šæåž«åãã«çæAIç ä¿®ãè¡ããªã©ãç 究éçºã»è²æã»ãªã¹ããªã³ã°ã«åãçµãã§ããã
Xã¢ã«ãŠã³ãã§ã¯ããžãã¹ãã³ãçæAIã䜿ãããªããã³ããçºä¿¡ããŠããŸããAIã䜿ãããªããããã«ãªãããæ¹ã¯ãã©ããŒããé¡ãããŸãïŒ
ãŸããçæAIã䜿ã£ãä»äºãã©ã¯ã«ãããããã¯ãéçºãããŠããã®ã§ãAIæ代ã楜ãã¿ããæ¹ã¯ãåŸ
ã¡ããŠããŸãã
ããã§ã¯é çªã«è§£èª¬ããŠãããŸãã
Browser Useãã¥ãŒããªã¢ã«
ïŒïœBrowser Useãšã¯ïŒ
Browser Useãšã¯ãAIãèªåã§èªåã®ç»é¢ãæäœããããã®AIããŒã«ã§ãã
éæãããç®çãæ瀺ãããšãAIãèªåã§ç®çéæã®ããã®ææ³ã现ããªã¿ã¹ã¯ã«ãã¬ã€ã¯ããŠã³ããŸãã
ãã®ã¿ã¹ã¯ãé çªã«å®è¡ããããšã§æçµçã«éæãããç®æšã«èŸ¿ãçããŸãã
ãã©ãŠã¶ãŒãæäœããæš©éãäžããããAIãšãŒãžã§ã³ãã®ãããªã€ã¡ãŒãžã§ãã
äŸãã°ããé£ã¹ãã°ã§ããããã®çŒèå±ãæããŠã»ããããšæ瀺ãããšãAIããé£ã¹ãã°ãæ€çŽ¢âãµã€ãã«å ¥ã£ãŠ"çŒèå±"ã§ããããé ã«çµã蟌ãâäžäœè¡šç€ºãããŠããçŒèå±ã®æ å ±ãååŸããŠè¡šç€ºãããã®ãããªæäœãèªåŸçã«èšç»ããŠå®è¡ãŸã§ããŠãããŸãã
ãã®ãããªå€§å€é¢çœãAIããŒã«ã䜿ãããªãããã«ãäžèšå
·äœäŸã亀ããªãã解説ããŠãããŸãã
ãŸãã¯èªåã®PCã§Browser Useãå©çšããããã®ã»ããã¢ããããå§ããŸãã
ïŒïœBrowser Useã®ç°å¢æ§ç¯
Browser Useãåããã ããªãæ¯èŒçç°¡åãªæºåã§æžã¿ãŸãã
- Pythonã®æºå
- ã³ãŒããšãã£ã¿ïŒVSCode / CursorçïŒã®æºå
- Pythonã®ä»®æ³ç°å¢ã®æºå
- Browser Useã®ã€ã³ã¹ããŒã«
- ç°å¢å€æ°ã®èšå®
Pythonã®æºå
Browser Useã®äœ¿çšã«ã¯PythonïŒver 11.1以äžïŒïŒãå¿
èŠã§ãã
ããŒãžã§ã³ãäœãå Žåãªã©ã¯èª¿æŽããããã«ããŸãããã
Pythonå
¬åŒããŒãžããChatGPTã«ããŒãžã§ã³ç¢ºèªãå€æŽã®æ¹æ³ã質åãããšåºæ¬çã«æ£ããããæ¹ãçããŠãããŸãã
ã³ãŒããšãã£ã¿ïŒVSCode / CursorçïŒã®æºå
æ®æ®µäœ¿çšããŠãããšãã£ã¿ãIDEãªã©ã§OKã§ãã
äžèšã®èšäºã®IDEã®ç« ãªã©ã芧ãã ããã
Pythonã®ä»®æ³ç°å¢ã®æºå
venvã§ä»®æ³ç°å¢ãäœæããŸãããã
Pythonã®ä»®æ³ç°å¢ã䜿ãããšã§ãä»ã®ãããžã§ã¯ããã·ã¹ãã ã«åœ±é¿ãäžããã«å¿
èŠãªããã±ãŒãžã管çã§ããŸãã
äžèšã®æé ã§èšå®å¯èœã§ãã
ïŒïœãããžã§ã¯ãçšã®ãã£ã¬ã¯ããªãäœæ
mkdir browser_use_project
cd browser_use_project
ä»®æ³ç°å¢ãäœæ
macOS / Linuxã®å ŽåïŒ
python3 -m venv venv
Windowsã®å ŽåïŒ
python -m venv venv
ä»®æ³ç°å¢ãæå¹å
macOS / Linuxã®å ŽåïŒ
source venv/bin/activate
Windowsã®å ŽåïŒ
venv\Scripts\activate
ä»®æ³ç°å¢ãæå¹åããããšãã¿ãŒããã«ïŒãŸãã¯ã³ãã³ãããã³ããïŒã®å é ã« (venv) ãšè¡šç€ºãããããã«ãªããŸãïŒ
ä»®æ³ç°å¢ã®çµäº
ä»®æ³ç°å¢ïŒvenvïŒã®äœ¿çšãçµäºãããå Žåã¯ãä»®æ³ç°å¢ããdeactivateãããããšã§ãã·ã¹ãã å šäœïŒãããã¯å¥éèšå®ãããç°å¢ïŒã® Python ã«æ»ããŸãã
deactivate
Browser Useã®ã€ã³ã¹ããŒã«
Browser Useãã€ã³ã¹ããŒã«ããŠAIã«ç»é¢æäœããŠãããæºåãå®äºãããŸãããã
ä»®æ³ç°å¢ãæå¹ã«ããç¶æ
ã§ã以äžã®ã³ãã³ããå®è¡ããŸãã
pip install browser-use
⌠è£è¶³
- pip ã³ãã³ãã¯ãä»®æ³ç°å¢ãæå¹ã«ããŠããå Žåãä»®æ³ç°å¢å ã® pip ãåŒã³åºãããŸãã
- ã€ã³ã¹ããŒã«åŸãpip list ãªã©ãå®è¡ãããš browser-use ã衚瀺ãããã¯ãã§ãã
(ãªãã·ã§ã³) Playwright ã®ã€ã³ã¹ããŒã«
browser-use
ã¯Playwrightãå©çšããŠãã©ãŠã¶å¶åŸ¡ãè¡ãããšãå€ãã§ããPlaywrightã䜿ãå Žåã¯ä»¥äžã®ããã«ã€ã³ã¹ããŒã«ããŠããã©ãŠã¶ãèªåæäœã§ããããã«ããŸãã
- playwright install ã§äž»èŠãªãã©ãŠã¶ïŒChromiumãFirefox, WebKitïŒãäžæ¬ã§ããŠã³ããŒãã»ã»ããã¢ãããããŸãã
pip install playwright
playwright install
ç°å¢å€æ°ã®èšå®
.env ãã¡ã€ã«ã§ API ããŒãèšå®ããŸãããã
browser-use
ã¯å€ãã®å ŽåãLLMãšé£æºããŠåäœããŸãã
OpenAI ã® API ã䜿ãäŸã§ã¯ãç°å¢å€æ°OPENAI_API_KEY
ã«APIããŒãã»ããããå¿
èŠããããŸããïŒãã®ä»ã®LLMã¢ãã«ã䜿ããšãã¯ããã«å¿ããAPIããŒããšã³ããã€ã³ããèšå®ããå¿
èŠããããŸãïŒ
.envãã¡ã€ã«ã®äœæ
ãããžã§ã¯ãã®ã«ãŒããã£ã¬ã¯ããªïŒä»åã®äŸã ãšbrowser_use_project
é
äžïŒã«ã.env
ãã¡ã€ã«ãæ°èŠäœæããŸãã
APIããŒã®èšå®
䜿çšããLLMã®APIããŒãèšå®ããŸããããOpenAIãAnthropicã®APIããŒã®èšå®äŸã¯äžèšã§ãã
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
ããã§Browser Useã®ãå®éã«åããããã®èšå®ãå®äºããŸãããæ£ããèšå®ããããŠããããã§ãã¯ããŠã¿ãŸãããïŒ
ä»åã¯OpenAI APIãæ³å®ããŠããŸããã¢ãã«ã¯æå®ã®gpt-4o-mini
ã䜿ããŸãã
browser_use_project/examples
ãšãããã£ã¬ã¯ããªãäœæãããã®é
äžã§test-browser-use.py
ãšãããã¡ã€ã«ãäœæããŸãããã
äžèšã®ã³ãŒããã³ããããŠå®è¡ããŠãã ããããã®ãã¡ã€ã«ã¯ã"Nicola_GenAI"ãšããååã®Xã¢ã«ãŠã³ãã®ãããã£ãŒã«ããŒãžãæ€çŽ¢ããŠè¡šç€ºãããåäœããŽãŒã«ã«ããŠããŸãã
ïŒNicola_GenAIã¯ãç§ã®ã¢ã«ãŠã³ãã§ãïŒ
ãAPIå©çšæãèšç®ããŠè¡šç€ºããŠãããããã«ã»ããããã
OpenAI APIã®äœ¿çšæã衚瀺ããããšãã¯ãlangchain-community
ãã€ã³ã¹ããŒã«å¿
èŠããããŸããä»®æ³ç°å¢ã«å
¥ã£ãŠããç¶æ
ã§ãäžèšãã¿ãŒããã«ã§å®è¡ããŠãã ããã
pip install -U langchain-community
test-browser-use.py
ã®ã³ãŒãã¯äžèšã§ãã
import asyncio
from langchain_openai import ChatOpenAI
from langchain.callbacks import get_openai_callback
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
async def main():
llm = ChatOpenAI(model='gpt-4o-mini')
browser = Browser(
config=BrowserConfig(
headless=False
)
)
task_text = (
"Go to google.com, search for 'Qiita AI ãã³ã©'. "
"After searching, click on the top result. "
"Finally, get the page title from that page and return me the title."
)
agent = Agent(
task=task_text,
llm=llm,
browser=browser,
)
# get_openai_callback() ã§ããŒã¯ã³äœ¿çšéãèšæž¬
with get_openai_callback() as cb:
history = await agent.run()
# ã³ãŒã«ããã¯ãçµäºããã¿ã€ãã³ã°ã§åèšããŒã¯ã³æ°ãªã©ãååŸ
print("== Usage Info ==")
print(f"Prompt tokens: {cb.prompt_tokens}")
print(f"Completion tokens: {cb.completion_tokens}")
print(f"Total tokens: {cb.total_tokens}")
print(f"Total cost (USD): ${cb.total_cost:.5f}")
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
ã³ãããå®äºããããbrowser_use_project
çŽäžã§ãpython examples/test-browser-use.py
ãã¿ãŒããã«ã§å®è¡ãããšæ°ãã«ãã©ãŠã¶ãç«ã¡äžãããåŠçãå§ãŸãã¯ãã§ãã
ãããšã©ãŒãåºãå Žåã¯ãã®å 容ãChatGPTãªã©ã«è³ªåãããšè§£æ±ºããããšãå€ãã§ãã
- .env ã«äºåã« OPENAI_API_KEY ãèšå®ãããŠããå¿ èŠããããŸããèšå®å¿ãã«æ³šæããŸãããã
ããã§ã¯ãæ§ã ãªäºäŸãè©ŠããªããBrowserãUseã«ã€ããŠåŠãã§ãããŸãããïŒ
äžèšã®ã¿ã¹ã¯ã1åå®è¡ãããšããã0.00444ãã«ïŒçŽ0.67åïŒã§ãããgpt-4oã䜿ã£ãå Žåã¯æ°ååã®ã³ã¹ãã«ãªããŸãã䜿çšããã¢ãã«ã§éé¡ãããªãå€ããã®ã§å©çšã®éã¯ã©ã®ã¢ãã«ã䜿ã£ãŠããã®ãæèããŸãããã
ïŒïœäºäŸé
ããã§ã¯ãæ§ã
㪠Browser Useã®å©çšäŸã玹ä»ããŸããèå³ã®ããäºäŸããè©ŠããŠã¿ããšè¯ãã§ãããã
browser_use_project/example
ãã©ã«ãã«ãããããã¡ã€ã«ãäœæããpythonã³ãã³ãã§å®è¡ãããšåäœããŸãã
äžèšã®äºäŸã«ãã£ãŠAIãšãŒãžã§ã³ããšããŠäœ¿çšããã¢ãã«ãç°ãªã圢ã§èšå®ãããŠããŸããç¹ã«ãgpt-4o-mini
以å€ã®å Žåã¯ã³ã¹ããæ¯èŒçé«é¡ã«ãªããããã®ã§ãã©ã®ã¢ãã«ã䜿çšãããïŒã¯ãèªèº«ã§æ€èšã®äžãå©çšãã ããã
å
šãŠOpenAI APIã®gpt-4o-mini
ã§ãåäœå¯èœã§ãã
äºäŸ1ïœAmazonæ€çŽ¢ãèªåã§è¡ã
ã¢ããŸã³æ€çŽ¢ã§ç¹å®ã®ååïŒlaptopïŒãæ€çŽ¢ãé«è©äŸ¡é ã«äžŠã³æ¿ããŠãæåã«è¡šç€ºãããååã®äŸ¡æ ŒãååŸããæäœãè¡ãããã°ã©ã ã§ãã
await agent.run(max_steps=3)
㧠max_stepsãå®çŸ©ãããšããšãŒãžã§ã³ããå®è¡ã§ããæ倧ã¹ãããæ°ãå®çŸ©ã§ããŸãïŒæ倧ã¹ãããæ°ãè¶
ããŠãšãŒãžã§ã³ããã¿ã¹ã¯ãå®è¡ããããšãããšéäžã§åŒ·å¶çã«åŠçãçµäºããŠããŸãã®ã§ã¹ãããæ°ã®å®çŸ©ã¯æ³šæãå¿
èŠã§ãïŒ
ã³ãŒãïŒamazon_search.py
"""
Simple try of the agent.
@dev You need to add OPENAI_API_KEY to your environment variables.
"""
import os
import sys
import asyncio
# ã¢ãžã¥ãŒã«æ€çŽ¢ãã¹ã調æŽããŠãäžäœãã£ã¬ã¯ããªã®ã¢ãžã¥ãŒã«ãèªã¿èŸŒã
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from langchain_openai import ChatOpenAI # OpenAIã®ã¢ãã«ã䜿ã
from browser_use import Agent # ãã©ãŠã¶æäœçšã®ãšãŒãžã§ã³ã
# GPT-4oã¢ãã«ã䜿çšããChatOpenAIã€ã³ã¹ã¿ã³ã¹ãäœæ
llm = ChatOpenAI(model='gpt-4o')
# ã¿ã¹ã¯ãšLLMãã»ããããŠãšãŒãžã§ã³ããçæ
agent = Agent(
task='Go to amazon.com, search for laptop, sort by best rating, and give me the price of the first result',
llm=llm,
)
# éåæã§ãšãŒãžã§ã³ããå®è¡
async def main():
await agent.run(max_steps=3) # æ倧3ã¹ããããŸã§è©Šè¡
agent.create_history_gif() # æäœå±¥æŽãGIFå
asyncio.run(main())
äºäŸ2ïœCAPTCHA解決ã«ãã£ã¬ã³ãž
CAPTCHAïŒç»åèªèšŒïŒã®ããããŒãžãéãããšãŒãžã§ã³ããèªåã§èªèšŒã解決ããããšè©Šã¿ããµã³ãã«ã§ãã
CAPTCHA ã®ä»çµã¿ã«ããæåçã¯ç¶æ³ã«ãã£ãŠå€ãããŸãããèšå®æ¬¡ç¬¬ã§èªåãã¹ãã®äžéšãšããŠåã蟌ãããšãå¯èœã§ãã
ã³ãŒãïŒexamples/captcha.py
"""
Simple try of the agent.
@dev You need to add OPENAI_API_KEY to your environment variables.
"""
import os
import sys
# äžäœãã£ã¬ã¯ããªã®ã¢ãžã¥ãŒã«ãèªã¿èŸŒããããã«ãã¹ãè¿œå
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent
# NOTE: captchas are hard. For this example it works. But e.g. for iframes it does not.
# for this example it helps to zoom in.
llm = ChatOpenAI(model='gpt-4o') # GPT-4oã¢ãã«ã䜿çš
agent = Agent(
task='go to https://captcha.com/demos/features/captcha-demo.aspx and solve the captcha',
llm=llm,
)
async def main():
# ãšãŒãžã§ã³ããå®è¡ããCAPTCHAã®ããããŒãžãžã¢ã¯ã»ã¹ãã解決ãè©Šã¿ã
await agent.run()
input('Press Enter to exit') # å®è¡åŸã«äžæåæ¢ããŠããŠãŒã¶ãŒãEnterãæŒããŸã§åŸ
æ©
asyncio.run(main())
äºäŸ3ïœäºçŽããŒãžã®ç©ºãæ¥çšãã§ãã¯
ç¹å®ã®äºçŽããŒãžãžã¢ã¯ã»ã¹ããä»æãšæ¬¡æã®äºçŽå¯èœæ¥ããããã©ããã確èªããã·ããªãªã§ãã
äºçŽããŒãžã«è¡ããæ¥ä»ããã§ãã¯ããããšãã£ãäžé£ã®åäœããšãŒãžã§ã³ããèªååããŸãã
ã³ãŒãïŒexamples/check_appointment.py
import asyncio
import os
import dotenv
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, SecretStr
from browser_use.agent.service import Agent
from browser_use.controller.service import Controller
dotenv.load_dotenv() # .envãã¡ã€ã«ãèªã¿èŸŒã
controller = Controller()
class WebpageInfo(BaseModel):
link: str = 'https://appointment.mfa.gr/en/reservations/aero/ireland-grcon-dub/'
@controller.action('Go to the webpage', param_model=WebpageInfo)
def go_to_webpage(webpage_info: WebpageInfo):
# WebpageInfoã®linkãè¿ãã¢ã¯ã·ã§ã³ããšãŒãžã§ã³ãããã®æ
å ±ã䜿ã£ãŠäºçŽããŒãžãžã¢ã¯ã»ã¹ãã
return webpage_info.link
async def main():
# ãšãŒãžã§ã³ãã«äžããã¿ã¹ã¯ïŒä»æãšæ¬¡æã®ç©ºãæ¥çšã調ã¹ããªããã°ã空ãããªãããšè¿ã
task = (
'Go to the Greece MFA webpage via the link I provided you.'
'Check the visa appointment dates. If there is no available date in this month, check the next month.'
'If there is no available date in both months, tell me there is no available date.'
)
# ChatOpenAIã«gpt-4o-miniã¢ãã«ãæå®ããAPIããŒã¯.envããååŸ
model = ChatOpenAI(model='gpt-4o-mini', api_key=SecretStr(os.getenv('OPENAI_API_KEY', '')))
agent = Agent(task, model, controller=controller, use_vision=True)
# äºçŽç¶æ³ã確èªããã¿ã¹ã¯ãå®è¡ããçµæãåãåã
result = await agent.run()
if __name__ == '__main__':
asyncio.run(main())
äºäŸ4ïœã¯ãªããããŒãæäœãè¡ã
ã¯ãªããããŒãã«ããã¹ããã³ããŒããå¥ã®ããŒãžã«ç§»åããŠè²Œãä»ããèªåæäœã®ãµã³ãã«ã§ãã
ããã§ã¯ãHello, world!ããã¯ãªããããŒãã«ã³ããŒããGoogle ãžã¢ã¯ã»ã¹ããåŸã«è²Œãä»ããå®è¡ããŠããŸãã
ã³ãŒãïŒexamples/clipboard.py
import os
import sys
from pathlib import Path
# ActionResult ã¯ãšãŒãžã§ã³ãã®ã¢ã¯ã·ã§ã³å®è¡çµæãæ±ãããã®ã¯ã©ã¹
from browser_use.agent.views import ActionResult
# ãã¹ã調æŽããŠäžäœãã£ã¬ã¯ããªããã¢ãžã¥ãŒã«ãèªã¿èŸŒããããã«ãã
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
import pyperclip # ã¯ãªããããŒãã®å
容ãååŸã»èšå®ã§ããã©ã€ãã©ãª
from langchain_openai import ChatOpenAI
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContext
# headless=Falseã§ãã©ãŠã¶ãèµ·åããŠãç»é¢ãèŠããç¶æ
ã§åãã確èªã§ãã
browser = Browser(
config=BrowserConfig(
headless=False,
)
)
controller = Controller()
# ã¯ãªããããŒããžã®ã³ããŒã¢ã¯ã·ã§ã³
@controller.registry.action('Copy text to clipboard')
def copy_to_clipboard(text: str):
pyperclip.copy(text) # pyperclip.copy ã§ã¯ãªããããŒãã«æååãã³ããŒ
return ActionResult(extracted_content=text)
# ã¯ãªããããŒãããã®ããŒã¹ãã¢ã¯ã·ã§ã³
@controller.registry.action('Paste text from clipboard', requires_browser=True)
async def paste_from_clipboard(browser: BrowserContext):
text = pyperclip.paste() # ã¯ãªããããŒãã®å
容ãååŸ
page = await browser.get_current_page()
await page.keyboard.type(text) # ããŒãžäžã§ããŒããŒãå
¥åãšããŠããã¹ããããŒã¹ã
return ActionResult(extracted_content=text)
async def main():
# âClipboardã«ã³ã㌠â Googleã«ã¢ã¯ã»ã¹ â ããã¹ããããŒã¹ãâ ãšããã¿ã¹ã¯ãå®çŸ©
task = 'Copy the text "Hello, world!" to the clipboard, then go to google.com and paste the text'
model = ChatOpenAI(model='gpt-4o') # GPT-4oã¢ãã«ãæå®ããŠèšèªã¢ãã«ãäœæ
# ãšãŒãžã§ã³ãã®åæåãã³ããŒïŒããŒã¹ãã®ã¢ã¯ã·ã§ã³ããã©ãŠã¶ã®æäœãèªåã§è¡ã
agent = Agent(
task=task,
llm=model,
controller=controller,
browser=browser,
)
await agent.run() # ãšãŒãžã§ã³ããå®è¡ïŒã¿ã¹ã¯å
容ã«æ²¿ã£ãŠæäœããïŒ
await browser.close() # åŠçãçµãã£ãããã©ãŠã¶ãéãã
input('Press Enter to close...') # ã³ã³ãœãŒã«ç»é¢ãç¶æããïŒæåã§EnterãæŒããŠçµäºïŒ
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïœã«ã¹ã¿ã åºåã䜿ã£ãçµæã®ååŸ
Hacker News ã®ãShow HNãã»ã¯ã·ã§ã³ã§1çªäººæ°ã®æçš¿ãæ¢ãããã®çµæãç¬èªãã©ãŒãããã§ååŸããäºäŸã§ãã
DoneResult ãšããããŒã¿ã¢ãã«ãã€ãããæçµçã«æçš¿ã®ã¿ã€ãã«ã»URLã»ã³ã¡ã³ãæ°ã»æçš¿åŸã®æéããŸãšããŠè¡šç€ºããŸãã
ã³ãŒãïŒexamples/custom_output.py
"""
Show how to use custom outputs.
@dev You need to add OPENAI_API_KEY to your environment variables.
"""
import os
import sys
# äžäœãã£ã¬ã¯ããªã®ã¢ãžã¥ãŒã«ãèªã¿èŸŒããããã«ãã¹ãä¿®æ£
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from browser_use import ActionResult, Agent, Controller
# .envãã¡ã€ã«ããç°å¢å€æ°ãèªã¿èŸŒãïŒOPENAI_API_KEYãªã©ïŒ
load_dotenv()
controller = Controller()
# ã«ã¹ã¿ã ã®åºåãã©ãŒãããïŒHacker Newsæçš¿ã®æ
å ±ïŒ
class DoneResult(BaseModel):
post_title: str
post_url: str
num_comments: int
hours_since_post: int
# ã¿ã¹ã¯çµäºæã«ãåºåãã«ã¹ã¿ã JSON圢åŒãšããŠè¿ãã¢ã¯ã·ã§ã³
@controller.registry.action('Done with task', param_model=DoneResult)
async def done(params: DoneResult):
# is_done=Trueã§ãšãŒãžã§ã³ãåŽã«ãå®äºããç¥ãããextracted_contentã«JSONãèŒãã
result = ActionResult(is_done=True, extracted_content=params.model_dump_json())
return result
async def main():
task = 'Go to hackernews show hn and give me the number 1 post in the list'
model = ChatOpenAI(model='gpt-4o')
agent = Agent(task=task, llm=model, controller=controller)
# ãšãŒãžã§ã³ããå®è¡ããŠå±¥æŽãååŸ
history = await agent.run()
# ãšãŒãžã§ã³ããæçµçã«è¿ããçµæãåãåºã
result = history.final_result()
if result:
# JSONæååãDoneResultåã«å€æãã
parsed = DoneResult.model_validate_json(result)
print('--------------------------------')
print(f'Title: {parsed.post_title}')
print(f'URL: {parsed.post_url}')
print(f'Comments: {parsed.num_comments}')
print(f'Hours since post: {parsed.hours_since_post}')
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïœã«ã¹ã¿ã System Prompt ã®å©çš
ããã©ã«ãã® System Prompt ã«ç¬èªã«ãŒã«ãè¿œå ãã**ãã©ããªã¿ã¹ã¯ã§ãæåã«Wikipediaãéãã**ãšããæåãå¿
ãåŸãããäŸã§ãã
MySystemPrompt 㧠important_rules() ããªãŒããŒã©ã€ãããæ°ããã«ãŒã«ãè¿œå ããŠããŸãã
ã³ãŒãïŒexamples/custom_system_prompt.py
import json
import os
import sys
# ã·ã¹ãã ããã³ãããå«ããã£ã¬ã¯ããªãŸã§ãã¹ãéã
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent, SystemPrompt
# ããã©ã«ãã®SystemPromptãç¶æ¿ããŠãèªåã®ã«ãŒã«ãè¿œå
class MySystemPrompt(SystemPrompt):
def important_rules(self) -> str:
existing_rules = super().important_rules()
# ã©ããªã¿ã¹ã¯ã§ãæåã«wikipedia.comãéããšããã«ãŒã«ãè¿œå
new_rules = 'REMEMBER the most important RULE: ALWAYS open first a new tab and go first to url wikipedia.com no matter the task!!!'
return f'{existing_rules}\n{new_rules}'
async def main():
task = "do google search to find images of Elon Musk's wife"
model = ChatOpenAI(model='gpt-4o')
# system_prompt_classã«MySystemPromptãæå®ããŠããšãŒãžã§ã³ãã«ç¹å¥ã«ãŒã«ãé©çš
agent = Agent(task=task, llm=model, system_prompt_class=MySystemPrompt)
# 確èªçšã«ã·ã¹ãã ããã³ãããJSONã§åºåïŒã«ãŒã«ãè¿œå ãããŠããã¯ãïŒ
print(
json.dumps(
agent.message_manager.system_prompt.model_dump(exclude_unset=True),
indent=4,
)
)
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïœãã¡ã€ã«ã¢ããããŒããè¡ã
test_cv.txt ãã¡ã€ã«ããã©ãŒã ãžèªåã¢ããããŒããããµã³ãã«ã§ãã
DOMèŠçŽ æ€çŽ¢åŸãset_input_files() ãåŒã³åºããŠãã¡ã€ã«ãã¹ãæå®ãããšã
å®ãã©ãŠã¶ã§ãŠãŒã¶ãŒãæåã§ã¢ããããŒãããæäœãèªååã§ããŸãã
ã³ãŒãïŒexamples/file_upload.py
import os
import sys
from pathlib import Path
from browser_use.agent.views import ActionResult
# ãã£ã¬ã¯ããªãäžã«ãã©ããã¢ãžã¥ãŒã«æ€çŽ¢ãã¹ãè¿œå
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContext
# ã¢ããããŒã察象ã®ãã¡ã€ã«ãã¹ãå®çŸ©
CV = Path.cwd() / 'examples/test_cv.txt'
import logging
logger = logging.getLogger(__name__)
# headless=Falseã§ãã©ãŠã¶ãUIä»ãã§èµ·åããchrome_instance_pathãæå®
browser = Browser(
config=BrowserConfig(
headless=False,
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
)
)
controller = Controller()
@controller.action('Upload file to element ', requires_browser=True)
async def upload_file(index: int, browser: BrowserContext):
path = str(CV.absolute())
# indexçªç®ã®DOMèŠçŽ ãååŸïŒãã©ãŠã¶å
ã®èŠçŽ ãé çªã§ç¹å®ïŒ
dom_el = await browser.get_dom_element_by_index(index)
if dom_el is None:
return ActionResult(error=f'No element found at index {index}')
file_upload_dom_el = dom_el.get_file_upload_element()
if file_upload_dom_el is None:
return ActionResult(error=f'No file upload element found at index {index}')
file_upload_el = await browser.get_locate_element(file_upload_dom_el)
if file_upload_el is None:
return ActionResult(error=f'No file upload element found at index {index}')
try:
# å®éã«ãã©ãŒã èŠçŽ ãžãã¡ã€ã«ãã¢ããããŒã
await file_upload_el.set_input_files(path)
msg = f'Successfully uploaded file to index {index}'
logger.info(msg)
return ActionResult(extracted_content=msg)
except Exception as e:
return ActionResult(error=f'Failed to upload file to index {index}')
# ãã¡ã€ã«ãã€ã¢ãã°ã衚瀺ãããå Žåã¯EscapeãæŒããŠéãã
@controller.action('Close file dialog', requires_browser=True)
async def close_file_dialog(browser: BrowserContext):
page = await browser.get_current_page()
await page.keyboard.press('Escape')
async def main():
# æå®ããŒãžãžã¢ã¯ã»ã¹ããè€æ°ã®ã¢ããããŒããã£ãŒã«ãã«ãã¡ã€ã«ãéã
task = (
'go to https://kzmpmkh2zfk1ojnpxfn1.lite.vusercontent.net/'
' and upload to each upload field my file'
)
model = ChatOpenAI(model='gpt-4o')
agent = Agent(
task=task,
llm=model,
controller=controller,
browser=browser,
)
await agent.run()
await browser.close()
input('Press Enter to close...')
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïœè€æ°äŒæ¥ã®æ±äººãæ¢ãããã¡ã€ã«ãã¢ããããŒãããŠå¿åãã
- PDF圢åŒã®CVãèªã¿èŸŒã
- MLã€ã³ã¿ãŒã³ã·ãããæ€çŽ¢ããé©åãªæ±äººãèŠã€ãã
- çµæãCSVã«ä¿å
- èªåå¿åïŒãã©ãŒã ãžã®CVã¢ããããŒããªã©ïŒ
ãšããäžé£ã®æµããåæ䞊è¡ã§è¡ããµã³ãã«ã§ãã
ã³ãŒãïŒexamples/find_and_apply_to_jobs.py
"""
Find and apply to jobs.
@dev You need to add OPENAI_API_KEY to your environment variables.
Also you have to install PyPDF2 to read pdf files: pip install PyPDF2
"""
import csv
import os
import re
import sys
from pathlib import Path
# PyPDF2ã§PDFãã¡ã€ã«ãèªã¿èŸŒã¿
from PyPDF2 import PdfReader
from browser_use.browser.browser import Browser, BrowserConfig
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from typing import List, Optional
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI, ChatOpenAI
from pydantic import BaseModel, SecretStr
from browser_use import ActionResult, Agent, Controller
from browser_use.browser.context import BrowserContext
load_dotenv()
import logging
logger = logging.getLogger(__name__)
controller = Controller()
CV = Path.cwd() / 'cv_04_24.pdf'
# æ±äººæ
å ±ãæ ŒçŽããããŒã¿ã¯ã©ã¹
class Job(BaseModel):
title: str
link: str
company: str
fit_score: float
location: Optional[str] = None
salary: Optional[str] = None
# CSVãã¡ã€ã«ãžæ±äººæ
å ±ãè¿œèš
@controller.action('Save jobs to file - with a score how well it fits to my profile', param_model=Job)
def save_jobs(job: Job):
with open('jobs.csv', 'a', newline='') as f:
writer = csv.writer(f)
writer.writerow([job.title, job.company, job.link, job.salary, job.location])
return 'Saved job to file'
# CSVãã¡ã€ã«ããæ±äººæ
å ±ãèªã¿èŸŒã
@controller.action('Read jobs from file')
def read_jobs():
with open('jobs.csv', 'r') as f:
return f.read()
# CVïŒPDFïŒãèªã¿èŸŒã¿ãããã¹ãããšãŒãžã§ã³ãã®ã¡ã¢ãªã«åã蟌ã
@controller.action('Read my cv for context to fill forms')
def read_cv():
pdf = PdfReader(CV)
text = ''
for page in pdf.pages:
text += page.extract_text() or ''
logger.info(f'Read cv with {len(text)} characters')
return ActionResult(extracted_content=text, include_in_memory=True)
# ã¢ããããŒãèŠçŽ ã«CVïŒPDFïŒãã»ããããŠå¿åãã
@controller.action('Upload cv to element - call this function to upload if element is not found, try with different index of the same upload element', requires_browser=True)
async def upload_cv(index: int, browser: BrowserContext):
path = str(CV.absolute())
dom_el = await browser.get_dom_element_by_index(index)
if dom_el is None:
return ActionResult(error=f'No element found at index {index}')
file_upload_dom_el = dom_el.get_file_upload_element()
if file_upload_dom_el is None:
return ActionResult(error=f'No file upload element found at index {index}')
file_upload_el = await browser.get_locate_element(file_upload_dom_el)
if file_upload_el is None:
return ActionResult(error=f'No file upload element found at index {index}')
try:
await file_upload_el.set_input_files(path)
msg = f'Successfully uploaded file to index {index}'
logger.info(msg)
return ActionResult(extracted_content=msg)
except Exception as e:
return ActionResult(error=f'Failed to upload file to index {index}')
# ãã©ãŠã¶ã®èšå®ïŒChromeãã¹ãã»ãã¥ãªãã£ç¡å¹åãªã©ïŒ
browser = Browser(
config=BrowserConfig(
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
disable_security=True,
)
)
async def main():
# äºåã«.envãã¡ã€ã«ããç°å¢å€æ°ãããŒã
load_dotenv()
# ground_taskã§å€§ãŸããªã¿ã¹ã¯å
容ãæå®
ground_task = (
'You are a professional job finder. '
'1. Read my cv with read_cv'
'find ml internships in and save them to a file'
'search at company:'
)
# è€æ°ã®äŒæ¥åãè¿œå ããŠã䞊åã§ã¿ã¹ã¯å®è¡
tasks = [
ground_task + '\n' + 'Google',
# ground_task + '\n' + 'Amazon',
# ...ç¶ã
ãšäŒæ¥ãè¿œå å¯èœ
]
# AzureChatOpenAIãå©çšããŠãšãŒãžã§ã³ããäœæ
model = AzureChatOpenAI(
model='gpt-4o',
api_version='2024-10-21',
azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT', ''),
api_key=SecretStr(os.getenv('AZURE_OPENAI_KEY', '')),
)
agents = []
for task in tasks:
agent = Agent(task=task, llm=model, controller=controller, browser=browser)
agents.append(agent)
# gatherã§è€æ°ãšãŒãžã§ã³ãã䞊è¡ããŠå®è¡
await asyncio.gather(*[agent.run() for agent in agents])
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïœè€æ°ã¿ããéããŠæäœãã
è€æ°ã®ã¿ããéããç¹å®ããŒãžã衚瀺åŸãæåã®ã¿ãã«æ»ã£ãŠåŠçãæ¢ãããµã³ãã«ã§ãã
ãã©ãŠã¶æäœã䞊åã§ã¯ãªããã¿ããã§åãæ¿ããŠç®¡çããŸãã
ã³ãŒãïŒexamples/multi-tab_handling.py
"""
Simple try of the agent.
@dev You need to add OPENAI_API_KEY to your environment variables.
"""
import os
import sys
# ã¢ãžã¥ãŒã«ãã¹ãäžäœã«è¿œå ããŠãå¿
èŠãªã¯ã©ã¹ãã€ã³ããŒãå¯èœã«
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent
# ChatOpenAIã§GPT-4oã¢ãã«ã䜿çš
llm = ChatOpenAI(model='gpt-4o')
# ãšãŒãžã§ã³ãã®ã¿ã¹ã¯ïŒãšãã³ã»ãã¹ã¯ããã©ã³ããã¹ãã£ãŒãã»ãžã§ããºã®ããŒãžã
# ããããå¥ã¿ãã§éããæåŸã«æåã®ã¿ãã«æ»ã£ãŠåæ¢
agent = Agent(
task='open 3 tabs with elon musk, trump, and steve jobs, then go back to the first and stop',
llm=llm,
)
async def main():
# ãšãŒãžã§ã³ãå®è¡ãã¿ã¹ã¯éãã«ãã©ãŠã¶ãæäœããŠããã
await agent.run()
# ãšã³ããªãŒãã€ã³ãïŒéåæåŠç㧠main() ãå®è¡
asyncio.run(main())
äºäŸ10ïœè€æ°ã®ãšãŒãžã§ã³ããåããã©ãŠã¶ãå ±æãã
1ã€ã®ãã©ãŠã¶ã³ã³ããã¹ãïŒã¿ããã»ãã·ã§ã³æ
å ±ïŒãè€æ°ã®AIãšãŒãžã§ã³ãéã§å
±æããé£ç¶çã»å調çãªæäœãè¡ããµã³ãã«ã§ãã
agent1 ãéããããŒãžã agent2 ã確èªãããšãã£ãæµããå¯èœã«ãªããŸãã
èªåãã¹ãã»æ
å ±åéãªã©ã§æå¹ïŒ
ã³ãŒãïŒexamples/multiple_agents_same_browser.py
import os
import sys
# ChatOpenAIã䜿ã£ãŠGPT-4oã¢ãã«ãåŒã³åºã
from langchain_openai import ChatOpenAI
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from browser_use import Agent, Browser, Controller
async def main():
# ãã©ãŠã¶ã€ã³ã¹ã¿ã³ã¹ãäœæïŒcontextãå
±æã§ãããã管çïŒ
browser = Browser()
# æ°ãããã©ãŠã¶ã³ã³ããã¹ããäœããwithæã§ç®¡ç
async with await browser.new_context() as context:
model = ChatOpenAI(model='gpt-4o')
# ãšãŒãžã§ã³ã1ïŒMetaã®æŽå²ããŒãžãéã + ã©ã³ãã ãªWikipediaèšäºãéã
agent1 = Agent(
task='Open 2 tabs with wikipedia articles about the history of the meta and one random wikipedia article.',
llm=model,
browser_context=context,
)
# ãšãŒãžã§ã³ã2ïŒæ¢ã«éããŠããWikipediaã¿ãã®ã¿ã€ãã«ãååŸãã
agent2 = Agent(
task='Considering all open tabs give me the names of the wikipedia article.',
llm=model,
browser_context=context,
)
await agent1.run() # æåã®ãšãŒãžã§ã³ããã¿ããéã
await agent2.run() # 次ã®ãšãŒãžã§ã³ãããããã®ã¿ãæ
å ±ãå©çš
asyncio.run(main())
äºäŸïŒïŒïœåŠçå®äºåŸã«ã¡ãŒã«éç¥ãéã
ã¿ã¹ã¯å®äºãããªã¬ãŒã«ããŠãçµæãã¡ãŒã«éä¿¡ãããµã³ãã«ã§ãã
ããã§ã¯ Done with task ã¢ã¯ã·ã§ã³å
㧠Gmail APIïŒyagmailïŒã䜿çšããã¿ã¹ã¯çµäºæã«ã¡ãŒã«ãéä¿¡ããŸãã
yagmailã§ã®ã¡ãŒã«éä¿¡ã§ã¯ãGmailã®ã¢ããªãã¹ã¯ãŒãèšå®ãå¿
èŠã«ãªããŸããéåžžã®Gmailãã¹ã¯ãŒãã¯äœ¿çšã§ããªãç¹ã«æ³šæã
ã³ãŒãïŒexamples/notification.py
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from browser_use import ActionResult, Agent, Controller
load_dotenv()
controller = Controller()
@controller.registry.action('Done with task ')
async def done(text: str):
import yagmail
# Gmailã¢ã«ãŠã³ãã§ã®ã¡ãŒã«éä¿¡ã«ã¯App Passwordãå¿
èŠ
# https://support.google.com/accounts/answer/185833
yag = yagmail.SMTP('your_email@gmail.com', 'your_app_password')
# ã¡ãŒã«ã®å
容ãäœæãéä¿¡
yag.send(
to='recipient@example.com',
subject='Test Email',
contents=f'result\n: {text}',
)
return ActionResult(is_done=True, extracted_content='Email sent!')
async def main():
task = 'go to brower-use.com and then done'
model = ChatOpenAI(model='gpt-4o')
agent = Agent(task=task, llm=model, controller=controller)
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïŒïœäžŠåãšãŒãžã§ã³ãã§åæã«è€æ°ã®ã¿ã¹ã¯ãå®è¡
è€æ°ã®æ€çŽ¢ã¿ã¹ã¯ïŒå€©æ°ãæå·éè²šäŸ¡æ ŒãNASAç»åãªã©ïŒãåæ䞊è¡ã§èµ°ããããµã³ãã«ã§ãã
asyncio.gather() ã䜿ãããšã§ããšãŒãžã§ã³ããç«ã¡äžãããã³ã«ãã©ãŠã¶æäœã䞊è¡å®è¡ãããŸãã
ã³ãŒãïŒexamples/parallel_agents.py
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use.agent.service import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
# ãã©ãŠã¶èšå®ãæå® (headless=Falseã§ãã©ãŠã¶ç»é¢ã衚瀺)
browser = Browser(
config=BrowserConfig(
disable_security=True,
headless=False,
new_context_config=BrowserContextConfig(save_recording_path='./tmp/recordings'),
)
)
llm = ChatOpenAI(model='gpt-4o')
async def main():
# è€æ°ã®ã¿ã¹ã¯ïŒäŸïŒå€©æ°æ€çŽ¢ãReddit確èªãBitcoinäŸ¡æ Œãã§ãã¯çïŒãæºå
tasks = [
'Search Google for weather in Tokyo',
'Check Reddit front page title',
'Look up Bitcoin price on Coinbase',
'Find NASA image of the day',
# 'Check top story on CNN',
# 'Search latest SpaceX launch date',
# 'Look up population of Paris',
# 'Find current time in Sydney',
# 'Check who won last Super Bowl',
# 'Search trending topics on Twitter',
# ä»ã«ãã¿ã¹ã¯ãè¿œå å¯
]
# åã¿ã¹ã¯ã«å¯Ÿå¿ãããšãŒãžã§ã³ããäœæããäžæã«å®è¡
agents = [Agent(task=task, llm=llm, browser=browser) for task in tasks]
await asyncio.gather(*[agent.run() for agent in agents])
# è¿œå ã®ã¿ã¹ã¯ãå¥éå®è¡ããäŸ
agentX = Agent(
task='Go to apple.com and return the title of the page',
llm=llm,
browser=browser,
)
await agentX.run()
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïŒïœå®éã®Chromeãã©ãŠã¶ïŒãããã°ã¢ãŒãïŒãåãã
Chromeããããã°ã¢ãŒãã§ç«ã¡äžããGoogle Docs ãžã¢ã¯ã»ã¹ããŠæ°èŠææžãäœæãå
容ãå
¥åããŠPDFãšããŠä¿åããâŠãšãã£ãé«åºŠãªæäœãAIãšãŒãžã§ã³ããè¡ãäŸã§ãã
ðš ãããã°ã¢ãŒãã§Chromeãéãããã«ãäžåºŠçŸåšéããŠããChromeãå
šãŠéããå¿
èŠãããç¹ã«æ³šæãå¿
èŠã§ãã
ã³ãŒãïŒexamples/real_browser.py
import os
import sys
from pathlib import Path
from browser_use.agent.views import ActionResult
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContext
# headless=Falseãã€chrome_instance_pathãæå®ãããããã°ã¢ãŒãã§Chromeãèµ·å
browser = Browser(
config=BrowserConfig(
headless=False,
# Macã®Chromeãã¹ãçŽæ¥æå®ïŒOSã«åãããŠå€æŽïŒ
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
)
)
controller = Controller()
async def main():
# Docsäžã§ãã瀌ã®æçŽããæžããŠPDFä¿åããã¿ã¹ã¯
task = f'In docs.google.com write my Papa a quick thank you for everything letter \n - Magnus'
task += f' and save the document as pdf'
model = ChatOpenAI(model='gpt-4o')
agent = Agent(
task=task,
llm=model,
controller=controller,
browser=browser,
)
await agent.run()
await browser.close()
input('Press Enter to close...')
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïŒïœçµæã®å±¥æŽããšã©ãŒå 容ã解æãã
ã¿ã¹ã¯å®è¡äžã®å±¥æŽïŒã¢ãã«ãèããã¹ãããããšã©ãŒãæçµURLãªã©ïŒããŸãšããŠååŸããåŸããæ€èšŒããäŸã§ãã
Webæäœãã°ããã¹ãããããã°ã«æŽ»çšã§ããŸãã
ã³ãŒãïŒexamples/result_processing.py
import os
import sys
from pprint import pprint
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import (
BrowserContext,
BrowserContextConfig,
BrowserContextWindowSize,
)
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent
from browser_use.agent.views import AgentHistoryList
from browser_use.controller.service import Controller
llm = ChatOpenAI(model='gpt-4o')
browser = Browser(
config=BrowserConfig(
headless=False,
disable_security=True,
extra_chromium_args=['--window-size=2000,2000'],
)
)
async def main():
# æ°ãããã©ãŠã¶ã³ã³ããã¹ãã§å®è¡å±¥æŽãèšé²
async with await browser.new_context(
config=BrowserContextConfig(
trace_path='./tmp/result_processing',
no_viewport=False,
browser_window_size=BrowserContextWindowSize(width=1280, height=1000),
)
) as browser_context:
agent = Agent(
task="go to google.com and type 'OpenAI' click search and give me the first url",
llm=llm,
browser_context=browser_context,
)
history: AgentHistoryList = await agent.run(max_steps=3)
print('Final Result:')
pprint(history.final_result(), indent=4)
print('\nErrors:')
pprint(history.errors(), indent=4)
print('\nModel Outputs:')
pprint(history.model_actions(), indent=4)
print('\nThoughts:')
pprint(history.model_thoughts(), indent=4)
# æåŸã«ãã©ãŠã¶ãéãã
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïŒïœHugging Faceã®ã¢ãã«æ å ±ããã¡ã€ã«ã«ä¿åãã
Hugging Face äžã§ç¹å®ã®ã©ã€ã»ã³ã¹ïŒcc-by-sa-4.0ïŒãæã€ã¢ãã«ãæ¢ãã䞊ã³æ¿ããããããã§ããã5件ããã¡ã€ã«ã«ä¿åããäŸã§ãã
Save models ã¢ã¯ã·ã§ã³ãå©çšããŠãååŸããã¢ãã«æ
å ±ãç°¡æçãªããã¹ã圢åŒã§ãã¡ã€ã«ã«åºåããŠããŸãã
ã³ãŒãïŒexamples/save_to_file_hugging_face.py
import os
import sys
# sys.pathã«è¿œå ããŠãäžäœãã£ã¬ã¯ããªã®ã¢ãžã¥ãŒã«ãèªã¿èŸŒããããã«ãã
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from typing import List, Optional
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from browser_use.agent.service import Agent
from browser_use.controller.service import Controller
# ãŸã Controller ãåæåããã«ã¹ã¿ã ã¢ã¯ã·ã§ã³ãç»é²ã§ããããã«ãã
controller = Controller()
class Model(BaseModel):
title: str
url: str
likes: int
license: str
class Models(BaseModel):
models: List[Model]
@controller.action('Save models', param_model=Models)
def save_models(params: Models):
# ååŸããã¢ãã«æ
å ±ã "models.txt" ã«è¿œèšã¢ãŒãã§æžã蟌ã
with open('models.txt', 'a') as f:
for model in params.models:
# title, url, likes, license ãç°¡æçãªãã©ãŒãããã§æžãåºã
f.write(f'{model.title} ({model.url}): {model.likes} likes, {model.license}\n')
# ãã©ãŠã¶æäœã®é²ç»äŸïŒåç»ã®URLïŒãã³ã¡ã³ããããŠããããå®è£
ãžã®åœ±é¿ã¯ãªã
# video: https://preview.screen.studio/share/EtOhIk0P
async def main():
# ã¿ã¹ã¯ïŒãHugging Faceã§ã©ã€ã»ã³ã¹ãcc-by-sa-4.0ã®ã¢ãã«ãèŠã€ããã©ã€ã¯æ°ã®å€ãé ã«äžäœ5件ãååŸãããã¡ã€ã«ã«ä¿åã
task = f'Look up models with a license of cc-by-sa-4.0 and sort by most likes on Hugging face, save top 5 to file.'
model = ChatOpenAI(model='gpt-4o')
agent = Agent(task=task, llm=model, controller=controller)
# ãšãŒãžã§ã³ããå®è¡ãããã©ãŠã¶ãæäœããŠã¢ãã«ãæ€çŽ¢ã»æ
å ±ãååŸãã«ã¹ã¿ã ã¢ã¯ã·ã§ã³ã§ä¿åãã
await agent.run()
if __name__ == '__main__':
# éåæé¢æ° main() ãå®è¡ããŠåŠçãã¹ã¿ãŒã
asyncio.run(main())
äºäŸïŒïŒïœæäœå±¥æŽã®ãã¬ãŒã¹ãä¿åãã
ãã©ãŠã¶æäœã®**ãã¬ãŒã¹ïŒå±¥æŽïŒ**ããæå®ãããã©ã«ãïŒ./tmp/traces/ïŒã«ä¿åããŸãã
ã©ã®ããŒãžãžè¡ã£ãããã©ããªã¢ã¯ã·ã§ã³ãåã£ãããåŸãã確èªã»åæã§ããããã«ãªãç¹ãç¹
ã³ãŒãïŒexamples/save_trace.py
import os
import sys
from langchain_openai import ChatOpenAI
# äžäœãã£ã¬ã¯ããªã®ã¢ãžã¥ãŒã«ã«ã¢ã¯ã»ã¹å¯èœã«ãã
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from browser_use.agent.service import Agent
from browser_use.browser.browser import Browser
from browser_use.browser.context import BrowserContextConfig
# GPT-4oã¢ãã«ãå©çš
llm = ChatOpenAI(model='gpt-4o', temperature=0.0)
async def main():
# Browserã€ã³ã¹ã¿ã³ã¹ãäœæ
browser = Browser()
# new_contextã§ãã¬ãŒã¹ãä¿åãããã©ã«ããæå®
async with await browser.new_context(
config=BrowserContextConfig(trace_path='./tmp/traces/')
) as context:
# ã¿ã¹ã¯ïŒãHackerNewsã«ã¢ã¯ã»ã¹âAppleã«ç§»åâéããŠããã¿ãã®ã¿ã€ãã«å
šéšè¿ããŠã
agent = Agent(
task='Go to hackernews, then go to apple.com and return all titles of open tabs',
llm=llm,
browser_context=context,
)
# ãšãŒãžã§ã³ãå®è¡
await agent.run()
await browser.close()
# ã¡ã€ã³é¢æ°ãå®è¡
asyncio.run(main())
äºäŸïŒïŒïœããŒãžã®ã¹ã¯ããŒã«æäœ
æå®ããWebããŒãžãéããŠäžäžã«ã¹ã¯ããŒã«ãããµã³ãã«ã§ãã
ããŒãžå
ã®ç¹å®ããã¹ããæ¢ãããã«å°ããã€ã¹ã¯ããŒã«ããããäžå®ãã¯ã»ã«ãã€ç§»åããããã§ããŸãã
ã³ãŒãïŒexamples/scrolling_page.py
import os
import sys
from browser_use.browser.browser import Browser, BrowserConfig
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent
"""
Example: Using the 'Scroll down' action.
This script demonstrates how the agent can navigate to a webpage and scroll down the content.
If no amount is specified, the agent will scroll down by one page height.
"""
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
# task="Navigate to 'https://en.wikipedia.org/wiki/Internet' and scroll down by one page - then scroll up by 100 pixels - then scroll down by 100 pixels - then scroll down by 10000 pixels.",
# ããã§ã¯Wikipediaã®ãInternetãã®ããŒãžãžè¡ããç®çã®æååãŸã§ã¹ã¯ããŒã«ãã
task="Navigate to 'https://en.wikipedia.org/wiki/Internet' and to the string 'The vast majority of computer'",
llm=llm,
browser=Browser(config=BrowserConfig(headless=False)),
)
async def main():
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïŒïœåºåããªããŒã·ã§ã³ãè¡ã
ãšãŒãžã§ã³ãã®æçµåºåããå®çŸ©ããPydanticã¢ãã«ïŒDoneResultïŒã§ããªããŒã·ã§ã³ãã圢åŒãåããªãå Žåã¯ãšã©ãŒãçºçããããã¢ã§ãã
ã³ãŒãïŒexamples/validate_output.py
"""
Demostrate output validator.
@dev You need to add OPENAI_API_KEY to your environment variables.
"""
import os
import sys
# äžäœãã£ã¬ã¯ããªã®ã¢ãžã¥ãŒã«ãžã¢ã¯ã»ã¹
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from browser_use import ActionResult, Agent, Controller
load_dotenv()
controller = Controller()
# åºåãã©ãŒããããå®çŸ© (title, comments, hours_since_start)
class DoneResult(BaseModel):
title: str
comments: str
hours_since_start: int
# ã¿ã¹ã¯å®äºæã«ãã®ã¢ãã«ãšåèŽããªã圢åŒãªããšã©ãŒã«ãªã
@controller.registry.action('Done with task', param_model=DoneResult)
async def done(params: DoneResult):
result = ActionResult(is_done=True, extracted_content=params.model_dump_json())
print(result)
# å®éšã®ãããæ»ãå€ãæå³çã«å€ãªæååã«ããŠãã
return 'blablabla'
async def main():
task = 'Go to hackernews hn and give me the top 1 post'
model = ChatOpenAI(model='gpt-4o')
# validate_output=True ã§ããªããŒã·ã§ã³ãæå¹å
agent = Agent(task=task, llm=model, controller=controller, validate_output=True)
# 圢åŒãåããªããšãšã©ãŒãåºã
await agent.run(max_steps=5)
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïŒïœWeb VoyagerãšãŒãžã§ã³ãïŒæ è¡ãµã€ããäºçŽãµã€ãåãïŒ
ãããã¬ã¹ãã©ãŠã¶ã®èšå®ã现ããã«ã¹ã¿ãã€ãºããäºçŽãµã€ãããã©ã€ãæ€çŽ¢ãµã€ããããã£ãŠããã«äºçŽãèªç©ºåžæ€çŽ¢ãªã©ãè¡ããµã³ãã«ã§ãã
- Booking.com ã Googleãã©ã€ããªã©ããŒãžé·ç§»ãå€ããµã€ãã§ããæå®ããã¿ã¹ã¯ã«åŸããè€æ°ã¹ããããããªãæ³å®ã
- minimum_wait_page_load_time / maximum_wait_page_load_time ãªã©ã®æéãã©ã¡ãŒã¿ã§ãããŒãžèªã¿èŸŒã¿ã¿ã€ãã³ã°ã調æŽå¯èœã§ãã
ã³ãŒãïŒexamples/web_voyager_agent.py
import os
import sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
import os
from langchain_openai import AzureChatOpenAI
from pydantic import SecretStr
from browser_use.agent.service import Agent
from browser_use.browser.browser import Browser, BrowserConfig, BrowserContextConfig
# ãã©ãŠã¶èšå®: ã»ãã¥ãªãã£ç¡å¹åããŠã£ã³ããŠãµã€ãºãªã©çŽ°ããæå®
browser = Browser(
config=BrowserConfig(
headless=False,
disable_security=True,
new_context_config=BrowserContextConfig(
disable_security=True,
minimum_wait_page_load_time=1,
maximum_wait_page_load_time=10,
browser_window_size={'width': 1280, 'height': 1100},
),
)
)
# AzureOpenAIã¢ãã«ã䜿çš
llm = AzureChatOpenAI(
model='gpt-4o',
api_version='2024-10-21',
azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT', ''),
api_key=SecretStr(os.getenv('AZURE_OPENAI_KEY', '')),
)
# TASK = """
# Find the lowest-priced one-way flight from Cairo to Montreal on February 21, 2025, including the total travel time and number of stops. on https://www.google.com/travel/flights/
# """
# TASK = """
# Browse Coursera, which universities offer Master of Advanced Study in Engineering degrees? Tell me what is the latest application deadline for this degree? on https://www.coursera.org/"""
# æ
è¡ç³»ã®ã¿ã¹ã¯äŸïŒBooking.comã§å®¶æåãã®ããã«ãæ¢ã
TASK = """
Find and book a hotel in Paris with suitable accommodations for a family of four (two adults and two children) offering free cancellation for the dates of February 14-21, 2025. on https://www.booking.com/
"""
async def main():
agent = Agent(
task=TASK,
llm=llm,
browser=browser,
validate_output=True, # åºåããªããŒã·ã§ã³ãæå¹ã«ãã
)
# max_stepsã倧ããã«ããŠäœè£ãæãããïŒãµã€ãæäœãè€éãªå Žåãå€ãããïŒ
history = await agent.run(max_steps=50)
# å±¥æŽããã¡ã€ã«ã«ä¿åïŒæäœãã°ã®è§£æã«äŸ¿å©ïŒ
history.save_to_file('./tmp/history.json')
if __name__ == '__main__':
asyncio.run(main())
äºäŸïŒïŒïœGeminiã䜿ã£ãæ€çŽ¢
ChatGoogleGenerativeAI ãçšããŠGemini-2.0ãªã©ã®Googleç¬èªã¢ãã«ããšãŒãžã§ã³ãã§æŽ»çšããRedditæ€çŽ¢ãæçš¿ååŸãèªååããäŸã§ãã
LLMã®ã¢ãã«ãšããŠGeminiãæ¡çšããŠBrowser Useãå®è¡ããããã®ã³ãŒãäŸã§ããGemini APIã¯ç¡æã§å©çšã§ããã®ã§ã¿ãã§å®éšãããæ¹ã¯ãã¡ãã䜿ã£ãŠããããšæããŸãïŒåŒã³åºãåæ°å¶éã¯ãããŸãïŒ
ç°å¢èšå®ãã¡ã€ã«.env
ã«Gemini APIããŒã®èšå®ãå¿ããªãããã«ããŸãããã
ãµã³ãã«ã¿ã¹ã¯ã¯ããRedditãæ€çŽ¢ããæçš¿ãã¯ãªãã¯ããé¢çœãã³ã¡ã³ããæœåºããããšãã£ããã®ã§ãã
ã³ãŒãïŒexamples/gemini.py
import asyncio
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from pydantic import SecretStr
from browser_use import Agent
load_dotenv()
# Gemini APIããŒãå¿ããªãããã«èšå®ãã
api_key = os.getenv('GEMINI_API_KEY')
if not api_key:
raise ValueError('GEMINI_API_KEY is not set')
# Geminiã¢ãã«ãæå®
llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash-exp', api_key=SecretStr(api_key))
async def run_search():
# Redditã®r/LocalLLaMAãæ€çŽ¢ãããbrowser useãé¢é£ã®æçš¿ãæ¢ããŠäžçªé¢çœãã³ã¡ã³ããæŸã
agent = Agent(
task=(
'Go to url r/LocalLLaMA subreddit and search for "browser use" in the search bar '
'and click on the first post and find the funniest comment'
),
llm=llm,
max_actions_per_step=4,
tool_call_in_content=False,
)
await agent.run(max_steps=25)
if __name__ == '__main__':
asyncio.run(run_search())
äºäŸïŒïŒïœXïŒTwitterïŒã«èªåæçš¿ãããã³ãã¬ãŒã
Xã®æçš¿èªååã§ãã
èå³ãã人ãå€ãããªããŒãã§ããã
- X(Twitter)ã«ãã°ã€ã³ããæå®ãããŠãŒã¶ãŒã«ã¡ã³ã·ã§ã³ä»ãã®ãã€ãŒããæçš¿ããåŸãå¥ã®ãã€ãŒãã«ãªãã©ã€ããæµããèªååãããã³ãã¬ãŒãã§ãã
- TwitterConfig ããŒã¿ã¯ã©ã¹ã§ã䜿çšããã¢ãã«ïŒOpenAI APIïŒãChromeã®ãã¹ããã€ãŒãæ¬æããªãã©ã€URLãªã©ãæå®ããŠããŸãã
- create_twitter_agent() ããšãŒãžã§ã³ããçæããèªç¶èšèªã§ã¿ã¹ã¯ãèšè¿°ããŸãã
- main() é¢æ°ã§ post_tweet() ãåŒã³åºãããšãŒãžã§ã³ãããã©ãŠã¶æäœãå®è¡ããŸãã
ã³ãŒãïŒexamples/post-twitter.py
"""
X Posting Template using browser-use
----------------------------------------
This template allows you to automate posting on X using browser-use.
It supports:
- Posting new tweets
- Tagging users
- Replying to tweets
Add your target user and message in the config section.
target_user="XXXXX"
message="XXXXX"
reply_url="XXXXX"
Any issues, contact me on X @defichemist95
"""
import os
import sys
from typing import Optional
from dataclasses import dataclass
from dotenv import load_dotenv
load_dotenv() # .envãã¡ã€ã«ããç°å¢å€æ°ãèªã¿èŸŒã
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use import Agent, Controller
# ============ Configuration Section ============
@dataclass
class TwitterConfig:
"""Twitterãžã®æçš¿ã«å¿
èŠãªèšå®ããŸãšããããŒã¿ã¯ã©ã¹"""
openai_api_key: str
chrome_path: str
target_user: str # éä¿¡å
ãŠãŒã¶ãŒ @ãçããæåå
message: str
reply_url: str
headless: bool = False
model: str = "gpt-4o-mini"
base_url: str = "https://x.com/home"
# å®éã«äœ¿ãèšå®ãæå®
config = TwitterConfig(
openai_api_key=os.getenv("OPENAI_API_KEY"),
chrome_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome", # Macç°å¢ã®Chromeãã¹
target_user="XXXXX",
message="XXXXX",
reply_url="XXXXX",
headless=False,
)
def create_twitter_agent(config: TwitterConfig) -> Agent:
# æå®ãããã¢ãã«ãAPIããŒã䜿çšã㊠ChatOpenAI ãåæå
llm = ChatOpenAI(model=config.model, api_key=config.openai_api_key)
# headlessã¢ãŒããChromeã®ãã¹ãBrowserConfigã«èšå®
browser = Browser(
config=BrowserConfig(
headless=config.headless,
chrome_instance_path=config.chrome_path,
)
)
controller = Controller()
# ãŠãŒã¶ãŒåã«@ãä»ãå ããæçš¿çšã¡ãã»ãŒãžãäœæ
full_message = f"@{config.target_user} {config.message}"
# Agentã€ã³ã¹ã¿ã³ã¹ãçæããã¿ã¹ã¯æïŒèªç¶èšèªïŒã®æé ããŸãšãã
return Agent(
task=f"""Navigate to Twitter and create a post and reply to a tweet.
Here are the specific steps:
1. Go to {config.base_url}. See the text input field at the top of the page that says "What's happening?"
2. Look for the text input field at the top of the page that says "What's happening?"
3. Click the input field and type exactly this message:
"{full_message}"
4. Find and click the "Post" button (look for attributes: 'button' and 'data-testid="tweetButton"')
5. Do not click on the '+' button which will add another tweet.
6. Navigate to {config.reply_url}
7. Before replying, understand the context of the tweet by scrolling down and reading the comments.
8. Reply to the tweet under 50 characters.
Important:
- Wait for each element to load before interacting
- Make sure the message is typed exactly as shown
- Verify the post button is clickable before clicking
- Do not click on the '+' button which will add another tweet
""",
llm=llm,
controller=controller,
browser=browser,
)
async def post_tweet(agent: Agent):
try:
# æ倧100ã¹ããããŸã§è©Šè¡ãæçš¿ãšãªãã©ã€ãèªåå®è¡
await agent.run(max_steps=100)
# æäœå±¥æŽãGIFãã¡ã€ã«ã«ãã
agent.create_history_gif()
print("Tweet posted successfully!")
except Exception as e:
print(f"Error posting tweet: {str(e)}")
def main():
# äžèšã®èšå®å€ãããšãŒãžã§ã³ããäœæ
agent = create_twitter_agent(config)
# éåæé¢æ° post_tweet() ãå®è¡ããŠæçš¿
asyncio.run(post_tweet(agent))
if __name__ == "__main__":
main()
ïŒïœç®çå¥Tips
ãå±éºãªæäœã¯äºåã«ãŠãŒã¶ãŒã®æ¿èªãåŸããããã¡ãŒã«ã¢ãã¬ã¹ãªã©ã®æ
å ±ã¯ãŠãŒã¶ãŒãæå
¥åãããããªã©ã®ç®çã«å¿ããå®è£
ã³ãŒããèšèŒããŸãã
ãããŸã§ç°¡æçãªäžäŸããã¹ããã©ã¯ãã£ã¹ãèšèŒããŠããããã§ã¯ãªãã®ã§æ³šæã
æ¢ã«äžèšã®äºäŸã®äžã§ç»å Žãããã®ãã»ãšãã©ã§ãããç®çå¥ã«åãåºããŠæŽçããŠãããŸãã
ç®çïŒïœBrowser Useã®æäœç»é¢ãé²ç»ãã
ãã©ãŠã¶æäœã®æ§åãé²ç»ããŠä¿åãããå Žåããããã°ããã¬ãŒã³æã«ãæäœã®æµãããåŸããæ¯ãè¿ãçšéã«äœ¿ããŸãã
Agentãå®çŸ©ããæã«browser
ãæå®ããã°ä»»æã®ãã¹(ããã§ã¯./tmp/recordings
)ã«AIãšãŒãžã§ã³ããå®è¡ããç»é¢æäœãä¿åã§ããŸãã
ã³ãŒãäŸïŒ
import asyncio
import os
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
# BrowserContextConfig ã§é²ç»ãä¿åãããã¹ãæå®
browser = Browser(
config=BrowserConfig(
headless=False,
new_context_config=BrowserContextConfig(save_recording_path='./tmp/my_recordings')
)
)
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task='Googleã§ãBrowser Useã®äœ¿ãæ¹ããæ€çŽ¢ããçµæã衚瀺ããŠãã ããã',
llm=llm,
browser=browser,
)
await agent.run()
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœãããã¬ã¹ã¢ãŒãã§äœ¿çšãã
ãã©ãŠã¶ç»é¢ã衚瀺ããïŒãããã¬ã¹ïŒãããã¯ã°ã©ãŠã³ãã§åäœããããå Žåã«å©çšããŸããèªåãã¹ããªã©ã«äŸ¿å©ã§ãã
ã³ãŒãäŸïŒ
import asyncio
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
# ãããã¬ã¹ã¢ãŒããTrueã«ãããšãã©ãŠã¶ãèµ·åããã«åŠç
browser = Browser(
config=BrowserConfig(headless=True)
)
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task='ããã³ã© ai twitterãã§Googleæ€çŽ¢ããŠãã ãã',
llm=llm,
browser=browser
)
await agent.run()
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœã«ã¹ã¿ã ã¢ã¯ã·ã§ã³ãäœæãã
ç¬èªã®ã¢ã¯ã·ã§ã³ãå®çŸ©ãããããã¿ã¹ã¯ã®äžããåŒã³åºãããå Žåã«äœ¿ããŸããäŸãã°ãç¹å®ã®URLã«çŽæ¥ç§»åãããªã©ã®ç¬èªåŠçãè¡ãã
controller.action
ãšãããã³ã¬ãŒã¿ïŒ@XXX
ã®ãããªãã®ïŒã䜿çšããã°ç¬èªã«æå®ããã¢ã¯ã·ã§ã³ãèšèšå¯èœã§ãã
èšå®ããã¢ã¯ã·ã§ã³ã¯ControllerãåæåããŠAgentã«åŒæ°ã§æž¡ããã°äœ¿ããŸãïŒæ®éã«Agentãèšå®ããæãšå
šãåãïŒ
ã³ãŒãäŸïŒ
import asyncio
from pydantic import BaseModel
from browser_use.agent.views import ActionResult
from browser_use.controller.service import Controller
from browser_use.agent.service import Agent
from langchain_openai import ChatOpenAI
controller = Controller()
class SimplePageInfo(BaseModel):
url: str
@controller.action('Jump to MyPage', param_model=SimplePageInfo)
def jump_to_my_page(page_info: SimplePageInfo):
"""
æå®URLã«çŽæ¥ãžã£ã³ãããã«ã¹ã¿ã ã¢ã¯ã·ã§ã³
"""
return ActionResult(extracted_content=page_info.url, include_in_memory=True)
async def main():
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task="Please jump to my special page: 'https://qiita.com/Nicola_GenAI/items/04e2babe86291fc4483b'",
llm=llm,
controller=controller,
)
result = await agent.run()
print(result)
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœã·ã¹ãã ããã³ãããã«ã¹ã¿ãã€ãºãã
æšæºã®ããã³ããã«è¿œå ã«ãŒã«ãå
¥ããŠããšãŒãžã§ã³ãã®è¡åãçžããããšãã«äœ¿ããŸãã
ã¯ã©ã¹ã®ç¶æ¿ãããã°åæèšå®ãããŠããããã³ããã«ãªãªãžãã«ã®æ
å ±ãè¿œå ã§ããŸãã
-
SystemPrompt
ã¯ã©ã¹ãç¶æ¿ããŠimportant_rules
ã¡ãœããããªãŒããŒã©ã€ãããã°ã·ã¹ãã ããã³ãããæŽæ°ã§ããŸãã - ãããAgentã®
system_prompt_class
ã«åŒæ°ãšããŠæž¡ãã°OKã§ãã
â» important_rules以å€ã®ã¡ãœããã«ã€ããŠããªãŒããŒã©ã€ãã§ããŸãããå
¬åŒçã«ã¯æšå¥šããŠããªãããã§ãã
ç§ãç¬èªå©çšããŠããBrowser Useã¯çŸæç¹important_rules
以å€ã®ã·ã¹ãã ããã³ãããããã£ãŠããŸããã
ã³ãŒãäŸïŒ
import asyncio
import json
from langchain_openai import ChatOpenAI
from browser_use import Agent, SystemPrompt
class CustomSystemPrompt(SystemPrompt):
def important_rules(self) -> str:
# 芪ã¯ã©ã¹ã®æ¢åã«ãŒã«ãååŸ
base_rules = super().important_rules()
# æ°ãã«è¿œå ãããã«ãŒã«ãè¿œèš
custom_rule = "æåªå
ã«ãŒã«: ãããªãã¿ã¹ã¯ã§ã Qiita ã«ã¢ã¯ã»ã¹ããŠæåã®æ
å ±ã確èªããããšã"
return f"{base_rules}\n{custom_rule}"
async def main():
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task="Search for 'ãã³ã© AI' on Google",
llm=llm,
system_prompt_class=CustomSystemPrompt,
)
# å®éã«çæãããã·ã¹ãã ããã³ããã確èª
print(json.dumps(
agent.message_manager.system_prompt.model_dump(exclude_unset=True),
indent=4,
ensure_ascii=False
))
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœCAPTCHAã解ã
CAPTCHAãåºãããŒãžãèªåæäœãããå Žåã®äžäŸãæåçã¯ç°å¢ãCAPTCHAã®çš®é¡ã«ãã倧ããå€åãããã泚æã
Agentã®åŒæ°ã§ããtask
ã«æå®ããã°ãã£ããã£è§£èªã«ææŠããŠãããŸãã
â» å¿
ããã解ããããã§ã¯ãããŸãããå
šèªåã§é²ãããããã¯ãæåå
¥åã«åãæ¿ããæ¹ããããšæããŸãã
ã³ãŒãäŸïŒ
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent
async def main():
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task='Open https://captcha.com/demos/features/captcha-demo.aspx and try to solve the captcha',
llm=llm,
)
await agent.run()
input('Press Enter to exit')
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœåºåãã«ã¹ã¿ãã€ãºãã
ã¿ã¹ã¯å®äºæïŒãŸãã¯ä»»æã®æç¹ïŒã«ãLLMãååŸããæ
å ±ãæŽåœ¢ããŠåºåãããå Žåã«äœ¿ããŸãã
ããã©ã«ãã ãšãagent.run()
ãå®è¡ããã LLM ãåããæç« ããã®ãŸãŸè¡šç€ºãããŸãã
èªç±ã«å€æŽããããšãã¯ãcontroller.registry.action
ãã³ã¬ãŒã¿ãã§ã¿ã¹ã¯å®äºæã®ã«ã¹ã¿ã ã¢ã¯ã·ã§ã³ãäœæãããšããã§ãã
ã³ãŒãäŸïŒ
import asyncio
from pydantic import BaseModel
from browser_use.agent.views import ActionResult
from browser_use.controller.service import Controller
from browser_use.agent.service import Agent
from langchain_openai import ChatOpenAI
controller = Controller()
class HNPostInfo(BaseModel):
title: str
url: str
points: int
@controller.registry.action('Finish with HN data', param_model=HNPostInfo)
async def finish(params: HNPostInfo):
"""
ã¿ã¹ã¯å®äºæã«HNã®æçš¿ããŒã¿ãJSONã§è¿ãã¢ã¯ã·ã§ã³
"""
return ActionResult(is_done=True, extracted_content=params.model_dump_json())
async def main():
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task='Xã®"@Nicola_GenAI"ãšããã¢ã«ãŠã³ãã®ãããã£ãŒã«ããŒãžã«é·ç§»ããŠãã¢ã«ãŠã³ãåãBioãURLãæããŠ',
llm=llm,
controller=controller
)
history = await agent.run()
data_str = history.final_result()
# åºåãããŒã¹ããŠè¡šç€º
if data_str:
post_info = HNPostInfo.model_validate_json(data_str)
print(f"Title: {post_info.title}\nURL: {post_info.url}\nPoints: {post_info.points}")
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœãã©ãŠã¶äžã®ããŒã¿ãã³ãããã
ããããŒãžããããã¹ããã³ããŒããå¥ããŒãžã«è²Œãä»ããæäœãèªååãããå Žåã«äœ¿ããŸãã
ã³ãŒãäŸïŒ
import asyncio
import pyperclip
from browser_use.agent.views import ActionResult
from browser_use.controller.service import Controller
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContext
from browser_use import Agent
controller = Controller()
@controller.registry.action('Copy text to clipboard')
def copy_text(text: str):
"""ã¯ãªããããŒãã«æååãã³ããŒãã"""
pyperclip.copy(text)
return ActionResult(extracted_content=text)
@controller.registry.action('Paste text from clipboard', requires_browser=True)
async def paste_text(browser: BrowserContext):
"""ã¯ãªããããŒãã®æååããã©ãŠã¶äžã§ããŒã¹ãå
¥å"""
text = pyperclip.paste()
page = await browser.get_current_page()
await page.keyboard.type(text)
return ActionResult(extracted_content=text)
async def main():
# headless=Falseã§ãç®èŠç¢ºèªããªããã³ããŒïŒããŒã¹ãã®èªååããã¹ã
browser = Browser(config=BrowserConfig(headless=False))
agent = Agent(
task=(
"Go to wikipedia.org, copy the first paragraph, "
"then go to deepl.com and paste the text to see its translation."
),
browser=browser
)
await agent.run()
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœãŠãŒã¶ãŒã«å ¥åãæ±ãã
éäžã§ãŠãŒã¶ãŒã®æåå
¥åãå¿
èŠãªã±ãŒã¹ïŒãã¹ã¯ãŒãå
¥åãè¿œå ã®æ€çŽ¢ã¯ãŒããªã©ïŒã«äœ¿ããŸãã
人éãéäžã§ä»åšããããšã§ã¿ã¹ã¯å
šäœã®æå確床ãäžæããŸãã
ã³ãŒãäŸïŒ
import asyncio
from browser_use.agent.views import ActionResult
from browser_use.controller.service import Controller
from browser_use import Agent
from langchain_openai import ChatOpenAI
controller = Controller()
@controller.action('Wait for user input')
def ask_user(question: str):
"""
ã¿ã¹ã¯ã®éäžã§ãŠãŒã¶ãŒã®å
¥åãååŸãã
"""
answer = input(f"{question}\n>>> ")
return ActionResult(extracted_content=answer, include_in_memory=True)
async def main():
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task="Go to google.com. Then ask the user: 'ã©ããªããŒã¯ãŒãã§æ€çŽ¢ããŸããïŒ'",
llm=llm,
controller=controller,
)
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïœæ¢åãã©ãŠã¶ã䜿çšãã
Googleããã¥ã¡ã³ãã®äœ¿çšããChromeçã®ãã©ãŠã¶ã§ã®ãã°ã€ã³ãå¿
èŠãªå Žåãªã©ãæ¢ã«ãã°ã€ã³æžã¿ã®ãã©ãŠã¶ãªã©ãåå©çšãããæããããŸãã
BrowserConfig
ã«æ¢åã®ãã©ãŠã¶ã®ãã¹ãæå®ããã°ãChromeã®ãããã°ã¢ãŒããèµ·åããŠãåããŠãŒã¶ãŒãããã¡ã€ã«ã䜿ãããšãã§ããŸãã
泚æ
- äºåã«äœ¿çšãããã©ãŠã¶ïŒChromeçïŒãéããŠããå¿ èŠããããŸã
- ã¢ã«ãŠã³ãã«ãã°ã€ã³ããäžã§åŠçãå®è¡ãããããå±éºã䌎ãªãã±ãŒã¹ããããŸãã䜿çšã®éã¯èªå·±è²¬ä»»ã§ãé¡ãããŸãã
ã³ãŒãäŸïŒ
import asyncio
import os
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
browser = Browser(
config=BrowserConfig(
headless=False,
# Chromeããããã°ã¢ãŒãã§èµ·åãããã¹ïŒäŸ: MacOSã®Chromeãã¹ïŒ
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
)
)
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task='Please open docs.google.com and create a new document titled "Browser Useã®å®éš"',
llm=llm,
browser=browser,
)
await agent.run()
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïŒïœå®è¡ã¹ãããã®äžéãèšå®ãã
é·ãããã¹ããããåé¿ããããAPIæéãæããããã«ã¹ãããæ°ãå¶éãããå Žåã«å©çšããŸãã
agent.run(max_steps=5)
ã®ããã«èšå®ã§ããŸããAIãšãŒãžã§ã³ãããã®ã¹ãããæ°ãè¶
ããŠæäœãå®è¡ããããšããŠã匷å¶çã«åŠçãçµäºããã®ã§æ³šæãå¿
èŠã§ãã
ã³ãŒãäŸïŒ
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
llm = ChatOpenAI(model='gpt-4o-mini')
agent = Agent(
task='XãéããAIç³»ã®æçš¿ãæ¢ããŠãã ãã',
llm=llm
)
# max_steps=3 ã§ã¹ããããå¶é
result = await agent.run(max_steps=3)
print(result)
if __name__ == '__main__':
asyncio.run(main())
ç®çïŒïŒïœæ¢åã®ã¢ã¯ã·ã§ã³ãäžæžããã
æ¢ã«ååšããdoneã¢ã¯ã·ã§ã³ãªã©ãç¬èªã®å®è£
ã§äžæžãããå®äºæã«ã¡ãŒã«éä¿¡ãããªã©ã®åŠçãå·®ã蟌ãçšéã«äœ¿ããŸãã
@controller.registry.action
ã«å¯ŸããŠæ¢ã«èšå®ããããã¢ã¯ã·ã§ã³ãæžãæããããšã§å©çšå¯èœã§ãã
äŸãã°ä»»æã®ã¢ã¯ã·ã§ã³ãå®çŸ©ããäžã§ãdone
ã¢ã¯ã·ã§ã³ã§ããis_done=True
ãå«ãŸããActionResultã眮ãæããããšãã§ããŸãã
ã³ãŒãäŸïŒ
import asyncio
import yagmail
from browser_use.controller.service import Controller
from browser_use.agent.views import ActionResult
from browser_use import Agent
from langchain_openai import ChatOpenAI
controller = Controller()
@controller.registry.action('Done with task')
async def my_custom_done_action(result_text: str):
"""
ã¿ã¹ã¯å®äºæã«ã¡ãŒã«ã§éç¥ãéä¿¡ããäŸ
"""
try:
yag = yagmail.SMTP('your_email@gmail.com', 'app_password')
yag.send(
to='recipient@example.com',
subject='Task Completed',
contents=f"çµæ: {result_text}",
)
# ããã©ã«ãã§ååšãã`si_done=True`ãå«ãŸããã¢ã¯ã·ã§ã³ãäžæžãããæäœ
return ActionResult(is_done=True, extracted_content="Email sent successfully!")
except Exception as e:
return ActionResult(error=str(e))
async def main():
llm = ChatOpenAI(model='gpt-4o')
agent = Agent(
task='Go to https://browser-use.com and then done',
llm=llm,
controller=controller
)
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
ïŒïœããŸã
å®åã§Browser Useã䜿ãæã«æèãããšããããš
ããžãã¹ãã³ãã¯ãªãšã€ã¿ãŒãªã©æ¥åã§çæAIã䜿ãå
šãŠã®æ¹åãã«æèããŠãããšè¯ããã€ã³ããèšèŒããŸãã
èªèº«ã®éå»ã®ç 究ãå®åã§ã®åçãèžãŸããŠæèšãã¿ãããšãèšèŒããŠããŸããåèãŸã§ã«ã芧ãã ããã
â» ãAIãAIãšãŒãžã§ã³ãã®ä»çµã¿ã«æ
£ããŠã¿ãããšããç®çã§ããã°ç©æ¥µçã«Browesr Useã䜿ãåããŠå¯èœæ§ãæ¢ãåºããŠæ¬²ããã§ãïŒ
ãªãã§ãBrowser Useã§è§£æ±ºããããšæããªã
ãèªåã§PCç»é¢ãèªåæ瞊ããŠãããã·ã¹ãã ããšèããšãããã»ãšãã©ã®ä»äºã¯AIã«ä»»ããã°ããããããªããïŒãšæã£ãŠããŸããããããŸããã
ãããå·éã«ãªã£ãŠèããŠã¿ããšã決ãŸã£ãããžã㯠+ LLMãã§äºè¶³ããããšãå€ãã§ãã
äŸãã°ããXã«æçš¿ãèªåã§è¡ããããšãèãããšãXã®APIã®ã¿ã§ãäºè¶³ããŸããAIã§èªåã®æçš¿ãããæç« ã£ãœããã®ãèªåã§æçš¿ãããå Žåã¯ãLLMãæçš¿æäœæâX APIã䜿ã£ãŠæçš¿ããšãã圢ã§ç°¡åã«èªååã§ããŸãã
å®è¡ã®ãã³ã«æ±ºãŸã£ãåŠçãé çªã«ããªã"ã¯ãŒã¯ãããŒ"ã®åœ¢ã§ããã°ãããããAIãšãŒãžã§ã³ãã®ãããªä»çµã¿ãå°å ¥ããªãã»ããã³ã¹ãçã«ãå®äžããã ã£ããããŸãã
èªååã®åã«æ¬åœã«å¿ èŠãªåŠçãèãã
Browser Useãªã©AIãšãŒãžã§ã³ãã®ã·ã¹ãã ã䜿ã£ãŠèªååãèããæã«ããå®ã¯èªååããããšããŠããã¹ãããã¯ãå¿
èŠããªããã®ããã ã£ããããŸãã
æ©æ¢°åŠç¿çéã§ãã"Garbage in, garbage out"ã§ã¯ãªãã§ãããèªååãããããã»ã¹ãããããå¿
èŠãªããã»ã¹ãªã®ãïŒãšäžåºŠæ ¹æ¬çãªéšåã«ç«ã¡è¿ã£ãŠèãããšè¯ãã§ãã
ãããã«
ãããã ã£ãã§ããããïŒ
GitHubå
¬åŒãªããžããªã¯å®è¡äŸã¯è€æ°æ²èŒãããŠãããã®ã®ãå
·äœçãªåããå¿çšã¯ã³ãŒããèªã¿èŸŒãŸãªããšç解ãé£ããã§ãã
ä»åã®äºäŸè§£èª¬ãçšéå¥ã®å®è£
äŸãå°ãã§ã圹ã«ç«ãŠã°å¹žãã§ãã
ããããã«èå³ãæã£ãæ¹ã¯èªèº«ã§ãªãªãžãã«ã®Browser Useã®æŽ»çšäºäŸãè©ŠããŠã·ã§ã¢ããŠã»ããã§ãïŒ
å€åã®æ¿ããçæAIã®æ³¢ã楜ããã§ä¹ãããªããŠãããŸãããïŒ
Xã¢ã«ãŠã³ãã§ã¯ããžãã¹ãã³ãçæAIã䜿ãããªããã³ããçºä¿¡ããŠããŸããAIã䜿ãããªããããã«ãªãããæ¹ã¯ãã©ããŒããé¡ãããŸãïŒ
ãŸããçæAIã䜿ã£ãä»äºãã©ã¯ã«ãããããã¯ãéçºãããŠããã®ã§ãAIæ代ã楜ãã¿ããæ¹ã¯ãåŸ
ã¡ããŠããŸãã
â» ééããã¢ããããŒãã§å€ãæ å ±ã«ãªã£ãŠããæ å ±ãèŠã€ããå Žåã¯ããææ°ã§ãããæ瀺ããã ããŸããšå¹žãã§ãã
⌠çæAIã«èå³ãæã£ãæ¹ã¯ãã¡ãããããã âŒ