More than 1 year has passed since last update.

[Python] キー入力を監視してアクティブWindowを切り替える(pynput)

Last updated at 2024-01-25Posted at 2024-01-25

はじめに

並行して作業していると、Windowの切り替えのためにクリックするかAlt+Tabを押すのも面倒になってきます。
そこで、キーを押したら指定したWindowに切り替え、それにキー入力を渡すプログラムを作成してみました。
Pythonで作成しましたが、Windows APIを使っているので若干C++の解説も入っています。

要件

マウスのみで操作できるマウスアプリ
キーボードで操作する必要のあるキーボードアプリ

があるとします。
キーボードアプリは、キーを押すと結果が出るのに数秒の待ち時間が発生します。

特別な仕組みが無いと以下のような作業手順になります。

キーボードアプリにキーを送信して結果を待つ。
その待ち時間の間にマウスアプリをクリックして進める。
キーボードアプリの結果が出る。
結果が出たら、キーボードアプリをクリックするか、Alt+Tabで切り替える。
キーボードアプリにキーを送信して結果を待つ。
以下、繰り返し

これから、手順4を減らすための仕組みを作ってみました。

環境

Windows 11 Pro

py -VV

Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]

pip list -l

pynput             1.7.6

コード

キー入力を監視してWindowを切り替える

今回の要件を満たすために以下のように実装しました。
大まかに説明すると、指定した入力元(マウスアプリ)へキー入力がなされると、それを出力先(キーボードアプリ)へ渡す動作をします。
あらかじめ入力元(マウスアプリ)のWindowハンドルと、出力先(キーボードアプリ)のWindowハンドルが必要です。
それの求め方は、後述します。

activeWindow.py

import argparse
import ctypes

from pynput import keyboard
from pynput.keyboard import Key

user32 = ctypes.windll.user32

nCmdShow = {
  "SW_HIDE": 0,
  "SW_SHOWNORMAL": 1,
  "SW_NORMAL": 1,
  "SW_SHOWMINIMIZED": 2,
  "SW_SHOWMAXIMIZED": 3,
  "SW_MAXIMIZED": 3,
  "SW_SHOWNOACTIVATE": 4,
  "SW_SHOW": 5,
  "SW_MINIMIZE": 6,
  "SW_SHOWMINNOACTIVE": 7,
  "SW_SHOWNA": 8,
  "SW_RESTORE": 9,
  "SW_SHOWDEFAULT": 10,
  "SW_FORCEMINIMIZE": 11,
}

def getWindowText(hwnd):
  length = user32.GetWindowTextLengthW(hwnd)
  buff = ctypes.create_unicode_buffer(length + 1)
  user32.GetWindowTextW(hwnd, buff, length + 1)
  return buff.value

def activeWindow(hwnd):
  user32.ShowWindow(hwnd, nCmdShow["SW_SHOWMAXIMIZED"])
  user32.SetForegroundWindow(hwnd)

def sendKey(controller, key):
  controller.press(key)
  controller.release(key)

class ActivateWindow:
  def __init__(self, source, destination):
    self.source = source
    self.destination = destination
    self.controller = keyboard.Controller()

  def windowFilter(self, _msg, _data):
    if user32.GetForegroundWindow() == self.source:
      activeWindow(self.destination)
    return True

  def onPress(self, key):
    print(key, user32.GetForegroundWindow() == self.source)
    if key == Key.end:
      return False
    if user32.GetForegroundWindow() == self.source:
      sendKey(self.controller, key)
    return True

  def join(self):
    with keyboard.Listener(on_press=self.onPress, win32_event_filter=self.windowFilter) as listener:
      listener.join()

def parseArgument():
  parser = argparse.ArgumentParser()
  parser.add_argument("source", default=0, type=int, help="Input window handle. from")
  parser.add_argument("destination", default=0, type=int, help="Input window handle. to")
  args = parser.parse_args()
  return (args.source, args.destination)

if __name__ == "__main__":
  source, destination = parseArgument()
  print(f"start: from ({source}, {getWindowText(source)}) to ({destination}, {getWindowText(destination)})")
  aw = ActivateWindow(source, destination)
  aw.join()
  print("end")

解説

起動と終了

起動は、以下のようにします。

console

> py activeWindow.py <入力元ハンドル> <出力先ハンドル>

こうすると、<入力元ハンドル>で指定したWindowへのキー入力が全て<出力先ハンドル>へ渡されます。

終了するには、任意のWindowでEndキーを押します。

キー入力の監視(pynput)

キー入力の監視は、pynputを利用して以下のようなクラスで実現しています。
(クラスである必要はないですが。)

python

class ActivateWindow:
  def __init__(self, source, destination):
    self.source = source
    self.destination = destination
    self.controller = keyboard.Controller()

  def windowFilter(self, _msg, _data):
    if user32.GetForegroundWindow() == self.source:
      activeWindow(self.destination)
    return True

  def onPress(self, key):
    if key == Key.end:
      return False
    if user32.GetForegroundWindow() == self.source:
      sendKey(self.controller, key)
    return True

  def join(self):
    with keyboard.Listener(on_press=self.onPress, win32_event_filter=self.windowFilter) as listener:
      listener.join()

監視のために、コールバックを設定してkeyboard.Listener()のインスタンスを作成します。
今回は、押すと離すの区別が必要ないのでon_pressのみに設定しています。
上記の場合は、キーが押されると、コールバック関数であるonPress()がkeyを引数に呼び出されます。
このコールバック関数がTrueを返すと監視を継続し、Falseを返すと監視が終了します。

win32_event_filterで設定しているwindowFilter()でアクティブWindowを切り替えます。
onPress()内で切り替えても良さそうですが、そうすると、切り替え元のWindowにキー入力が渡ってしまう事が稀にあるのでこのような形にしました。

今のところ修正出来ていないのですが、起動させて初めての切り替えでキー入力が渡されない事があります。

アクティブWindowの切り替え

python

def activeWindow(hwnd):
  user32.ShowWindow(hwnd, nCmdShow["SW_SHOWMAXIMIZED"])
  user32.SetForegroundWindow(hwnd)

アクティブWindowを切り替えるのに、Windows APIのShowWindow()を使っています。

c++

BOOL ShowWindow(
  [in] HWND hWnd,
  [in] int  nCmdShow
);

hWndには、状態を変更したいWindowのハンドルを渡します。
nCmdShowには、変更する状態を数値で指定します。
Windowをアクティブ化+最大化とするために3を設定します。

ただ、このままだと稀にWindowが最前面に来ない事があります。
そのため、次にWindows APIのSetForegroundWindow()でWindowを最前面に持ってきます。

c++

BOOL SetForegroundWindow(
  [in] HWND hWnd
);

下準備：WindowタイトルとWindowハンドルを列挙する(enumWindowName.py)

Windowを切り替えるためには、WindowタイトルかWindowハンドルが必要です。
そこでまずは、WindowタイトルとWindowハンドルを列挙するプログラムを用意します。

enumWindowName.py

import argparse
import ctypes
import functools
import re
from ctypes import wintypes

user32 = ctypes.windll.user32
WNDENUMPROC = ctypes.WINFUNCTYPE(wintypes.BOOL, wintypes.HWND, wintypes.LPARAM)

def printList(lt, color=True):
  length = len(lt)
  maxLen = len(str(length))
  print(length)
  for i, x in enumerate(lt):
    print(f"{i:>{maxLen}} : {x}")
      
def getWindowText(hwnd):
  length = user32.GetWindowTextLengthW(hwnd)
  buff = ctypes.create_unicode_buffer(length + 1)
  user32.GetWindowTextW(hwnd, buff, length + 1)
  return buff.value

def _callbackEnumWindows(hwnd, lParam, result):
  title = getWindowText(hwnd)
  if len(title) > 0 or lParam != 0:
    result.append((title, hwnd))
  return True

def parseArgument():
  parser = argparse.ArgumentParser()
  parser.add_argument("-t", "--title", default="", help="input window title. (regular expression)")
  return parser.parse_args().title

if __name__ == "__main__":
  title = parseArgument()
  result = []
  callback = functools.partial(_callbackEnumWindows, result=result)
  user32.EnumWindows(WNDENUMPROC(callback), 0)
  if title == "":
    printList(result)
  else:
    r = re.compile(title)
    for t, hwnd in result:
      if r.search(t):
        print(f"{t}, {hwnd}")

解説

Windowハンドルの列挙

Windowハンドルを列挙するために、Windows APIであるEnumWindows()を使います。

c++

BOOL EnumWindows(
  [in] WNDENUMPROC lpEnumFunc,
  [in] LPARAM      lParam
);

lpEnumFuncは、コールバック関数へのポインタ。
型は、C++で書くと以下のようになります。

c++

BOOL CALLBACK EnumWindowsProc(
  _In_ HWND   hwnd,
  _In_ LPARAM lParam
);

pythonで書くと

python

ctypes.WINFUNCTYPE(wintypes.BOOL, wintypes.HWND, wintypes.LPARAM)

となります。

lParamは、コールバック関数へ渡す値。
特別な処理が必要なければ、0で良いです。

コールバック関数は、以下のように実装しました。

python

def _callbackEnumWindows(hwnd, lParam, result):
  title = getWindowText(hwnd)
  if len(title) > 0 or lParam != 0:
    result.append((title, hwnd))
  return True

EnumWindows()を実行すると、このコールバックがhwndに列挙されたWindowハンドルを格納されて呼び出されます。
Windowタイトルも表示したいので、getWindowText()でWindowタイトルを取得します。
そして、結果をresultに格納します。
今回は、lParamが0の場合は、タイトルが空白の物を格納しないようにしています。

上記のように本来コールバックは、3つ目の引数を取りません。
そのため、呼び出す前にfunctools.patialなどでresultにリストを紐づける必要があります。
コードでは、以下のように呼び出しています。

python

result = []
callback = functools.partial(_callbackEnumWindows, result=result)
user32.EnumWindows(WNDENUMPROC(callback), 0)

Windowタイトルの取得

EnumWindows()は、Windowハンドルを列挙するだけなのでWindowタイトルを取得する必要があります。
実装では、以下のような関数で取得しています。

python

def getWindowText(hwnd):
  length = user32.GetWindowTextLengthW(hwnd)
  buff = ctypes.create_unicode_buffer(length + 1)
  user32.GetWindowTextW(hwnd, buff, length + 1)
  return buff.value

ここでは、Windows APIのGetWindowTextW()を使ってWindowタイトルを取得しています。

c++

int GetWindowTextW(
  [in]  HWND   hWnd,
  [out] LPWSTR lpString,
  [in]  int    nMaxCount
);

lpStringは、結果の文字列を格納するためのバッファへのポインタ。
nMaxCountは、バッファのサイズ。

pythonでバッファを作成するには、ctypes.create_unicode_buffer()を使用します。
バッファのサイズは、GetWindowTextLengthW()で取得してctypes.create_unicode_buffer()に渡します。

c++

int GetWindowTextLengthW(
  [in] HWND hWnd
);

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up