pyautogui を使ってみる

Last updated at 2024-12-13Posted at 2023-02-11

pyautogui は、リモートデスクトップで使う時、接続を切ると強制終了してしまう

ボロボロのアンドロイドに
・microsoft windows remote デスクトップアプリを入れる
・リモート接続時にうまく入力できないので
アプリの起動時に、「General」「Use scancode input when available」をオフに設定する

・解像度を大きめにしておく
（windowsの画面くらいに合わせる）

これで延々に動かすことができる。

まずは環境を整える

pycharm の無料版もDL
https://www.jetbrains.com/ja-jp/pycharm/

windows の自動化も
https://pyautogui.readthedocs.io/en/latest/quickstart.html#mouse-functions

マウス座標を取得
https://all-freesoft.net/hard3/mouse/mouse-point-viewer/mouse-point-viewer.html

を使うと良いかも。

python3 pip は導入済みであることが条件

大文字の V

python -V
Python 3.10.10

pip -V
pip 22.3

OK

hello world

まず、ハローワールドから。

C:\Users\h に hoge.py を作成

hoge.py

print("Hello World!")

WindowsTerminal (以下、wt) を起動し

cd \Users\h（適当なフォルダ）
C:\Users\h> python hoge.py

Hello World!出力されたらOK

WTを起動しpyautoguiをインスコ

windows キー > wt > エンター

pip install pyautogui

でインスコ

スクショを取得したり、画像を探したりするため、 pllow をインスコ

python -m pip install --upgrade pip
pip install Pillow pip

画面から画像を探す

参考
https://pyautogui.readthedocs.io/en/latest/screenshot.html#the-screenshot-function

上記の「7」が画像をダウンロードし、ローカルに保存。
で、以下のコマンドをやれば動作する。
マルチモニターしていると画面のモニター以外からは探さないので None という結果になる。

button7location = pyautogui.locateOnScreen('calc7key.webp')
print(button7location)
# Box(left=371, top=200, width=50, height=41)
print(button7location.left)
# 371

上記を踏まえて、 qiita の投稿するボタンをクリックさせる

適当に投稿するボタンを切り抜いて、保存。
位置を探してクリックさせよう。

toukoulocation = pyautogui.locateOnScreen('toukousuru.png')
pyautogui.moveTo(toukoulocation.left, toukoulocation.top, 1, pyautogui.easeInQuad)
pyautogui.click()

同じ画像が複数あって
すべての画像をマッチさせたい場合

matches = pyautogui.locateAllOnScreen('firefox-plus.png')

# マッチした画像の位置を表示する
for match in matches:
    print(match)

マッチした画像の 0 番目の値を取得

matches = list(pyautogui.locateAllOnScreen('firefox-plus.png'))
# matchesの要素数を表示する
count = len(matches)
print(f'マッチした数: {count}')
print(matches[0])

上記のように list 関数を使わないと matches[0] とした時にエラーが出る

ランダムな wait をかける

ランダムに動作ささせなきゃ、BOTが動作してるのバレる。
ということで処理時間をランダムにしよう

import pyautogui
import random
import time

rand = random.uniform(0,1)
print(rand)

pyautogui.moveTo(100,500,rand, pyautogui.easeInQuad)

time.sleep(1.5)

pyautogui.moveTo(900,1000,rand, pyautogui.easeInQuad)

pyauto gui でコロンを入力できない

everything で pyautogui と検索し、
https://teratail.com/questions/79973

を参考に270行目あたりを書き換える

画像認識力をアップ

opencv をインスコ

pip install opencv-python
pip install opencv-contrib-python

画像を探す

matches = list(pyautogui.locateAllOnScreen('firefox-plus.png', grayscale=True, confidence=0.9))

参考
https://self-development.info/pyautogui%E3%81%A7%E7%94%BB%E5%83%8F%E8%AA%8D%E8%AD%98%EF%BC%88locateonscreen%EF%BC%89%E3%80%90python%E3%80%91/

confidence
画像を読み込めないときに、調整する変数
のみ追加の場合、0.5まで下げればコードが動きました。
このconfidence変数の利用には、OpenCVが必須です。

grayscale
grayscale=True を渡すことで、わずかに高速化することができます（約 30% 程度）。
これにより、画像やスクリーンショットの色が脱色され、ロケートは高速化されますが、誤検出の原因となる可能性があります。

以上

マルチモニター対応版

import os
import pyautogui
import mss
from PIL import Image
import ctypes

# DPIスケーリングを無効化
ctypes.windll.user32.SetProcessDPIAware()

# キャプチャした画像で画像認識
def search_and_click_in_all_monitors(target_image, confidence=0.6):
    # ファイル存在確認
    if not os.path.exists(target_image):
        raise FileNotFoundError(f"画像ファイルが見つかりません: {target_image}")
    
    with mss.mss() as sct:
        # すべてのモニターを取得
        monitors = sct.monitors  # モニターリスト (1から始まる)
        print(f"モニター数: {len(monitors) - 1}")  # monitors[0]は全体の解像度情報

        # 各モニターで画像を検索
        for i, monitor in enumerate(monitors[1:], start=1):  # monitors[1:]は個別モニター
            print(f"モニター {i} の範囲: {monitor}")
            screenshot = sct.grab(monitor)  # キャプチャ

            # デバッグ用: キャプチャ画像を保存
            file_name = f"monitor_{i}.png"
            mss.tools.to_png(screenshot.rgb, screenshot.size, output=file_name)
            print(f"モニター {i} のキャプチャを保存しました: {file_name}")

            # キャプチャ画像をPIL形式に変換
            image = Image.frombytes("RGB", screenshot.size, screenshot.rgb)

            try:
                # 画像検索
                location = pyautogui.locate(target_image, image, confidence=confidence)
                if location:
                    # キャプチャ画像と実際のモニター解像度のスケール比を計算
                    scale_x = monitor["width"] / screenshot.width
                    scale_y = monitor["height"] / screenshot.height

                    # 画像の中心座標を計算（キャプチャ画像内のローカル座標）
                    center_x, center_y = pyautogui.center(location)

                    # スケールを適用してグローバル座標を計算
                    global_x = int(center_x * scale_x + monitor["left"])
                    global_y = int(center_y * scale_y + monitor["top"])

                    # マウスを移動してクリック
                    pyautogui.moveTo(global_x, global_y, duration=0.5)  # 0.5秒かけて移動
                    pyautogui.click()
                    return f"モニター {i} で画像が見つかり、クリックしました: ({global_x}, {global_y})"
            except pyautogui.ImageNotFoundException:
                print(f"モニター {i} で画像が見つかりませんでした")
    
    return "すべてのモニターで画像が見つかりませんでした"

# 対象画像
target_image = "calc7key.webp"  # 対象画像

# 実行
try:
    result = search_and_click_in_all_monitors(target_image, confidence=0.8)  # 信頼度を調整
    print(result)
except FileNotFoundError as e:
    print(e)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up