More than 1 year has passed since last update.

PySharkのあれやこれや(知見)

Last updated at 2023-03-13Posted at 2023-03-12

使用環境について

Python 3.10.x (Pysharkは3.x以上じゃないと動作しない)
Pyshark
Wireshark(tshark) 4.0.4 (2023/3/12現在最新ver)
codecs

各インストール手順

Python, Wireshark: いろんなところで説明されているため割愛

Pyshark:
pip install pyshark

codecs:
pip install codecs
-> hex値のままのパケットデータをutf-8に変換する必要があったためcodecsを使用。

Wiresharkについて

Wireshark portableでもよいが、
その場合はC:\Program Files\Wireshark,C:\Program Files(x86)\Wiresharkに
Wireshark portable - appフォルダの中をすべてコピペしなければならない。
インストール制限とかないならインストーラで素直に入れたほうが楽。
(Wireshark portableの中にあるtsharkを指定することもできるみたいだが、今回は使用していない)

Pysharkについて

pythonでwireshark(tshark)を使いたい人向けのライブラリ。
tsharkでpcapファイルを読み込み、データの集計や閲覧などが可能になる。

今回の使用用途

2種類のパケットのデータを取得し、
取得したデータを利用してフィルタをいくつも手動で作成する手順があったため自動化の一環として作成。

おまじない

プログラミング何もわからない人向け。
先頭行に入れましょう。

import pyshark, codecs

pcapファイルをpythonで開く

下記パラメータでtsharkのファイルパスを指定することもできるようだが今回はスルー。

普通に開く場合:

cap = pyshark.FileCapture('ファイル名(相対パスによる指定可能)')

Decode as...を指定する場合:

cap = pyshark.FileCapture('ファイル名(相対パスによる指定可能)',
                          decode_as={'tcp.port==xxxx':"プロトコル名",'udp.port==xxxx':"プロトコル名"...}
                          )

フィルタをかけた状態で開く場合:

cap = pyshark.FileCapture('ファイル名(相対パスによる指定可能)',
                          display_filter="(tcp.port == xxxx) and (udp.port == xxxx)" +
                                         " or frame contains xx:xx:xx"
                          )

,で区切って複数のパラメータを指定することが可能
Decode as...を指定し、フィルタをかけた状態で開く場合:

cap = pyshark.FileCapture('ファイル名(相対パスによる指定可能)',
                          decode_as={'tcp.port==xxxx':"プロトコル名",'udp.port==xxxx':"プロトコル名"...},
                          display_filter="(tcp.port == xxxx) and (udp.port == xxxx)" +
                                         " or frame contains xx:xx:xx"
                          )

使用しなくなった場合は下記で解放すること
解放忘れて終了するとエラー吐きます。

cap.close()

ファイルオープン時のエラーについて

ファイル名を文字列として扱わなかった
ex: pyshark.FileCapture(ファイル名) (ちゃんと' か"で囲むこと(n敗))
フィルタが正常な状態でない
ex: ((tcp.port == xxx or udp.port == xxx) or frame contains xx:xx:xx ( ( が多い)
decode_asで指定したプロトコル名がWiresharkの中に存在しない
-> この場合、cap.set_debug()をファイルオープン後に記載すると
　どのプロトコル名が存在しないか詳細ログを出してくれるようになる。

ファイルオープン後のレイヤについて

下記の通りとなる

cap[packet, packet, packet...]
packet[eth layer, ip layer, tcp layer, http layer...]
layer[attribute1, attribute2...]

どんな要素があるのか確認したい場合は
packet.layers[x].field_namesやpacket.http.field_namesで確認可能

同名レイヤが複数存在する場合

例えば下記のような場合

packet[eth layer, ip layer, tcp layer, http layer, http layer, http layer]

取得したいattribute名が判明していれば、下記で対応可能

packet = cap[0]

for i in range(int(len(packet.layers))):
    if 'attribute_name' in packet.layers[i].field_names:
        #attribute名がlayers[i]で存在したら、iを返却
        break
    if 'attribute_name' not in packet.layers[i].field_names:
        #attribute名が存在しないため、-1を返して存在しないことにする
        i = -1

print(packet.layers[i].attribute_names)

hexデータを変換するには

こう。

codecs.decode(packet.layers[x].hexdata.replace(":",""), "hex").decode("utf-8")

http2パケットを取り扱うこともあったのでちょっとした知見

http2パケットは性質上http2 layerが複数存在することがある。
また、responseパケットを拾おうとするとstreamで追うことになる。

http2 layerが複数存在した場合、streamでresponseパケットを抽出するには

packet.tcp.stream
packet.http2.streamid -> packet.layers[x].streamid

のattribute内の値が必要となる。
フィルタは下記の通り。

tcp.stream eq packet.tcp.stream and http2.streamid eq packet.http2.streamid

あとがき

後は仕様を頑張ってソースコードに落とし込みます。
正直pythonわからないマンなので調べながらやってました。
AIに頼る手もありですが、汎用的な加工コードぐらいをちらっと聞いて適切に利用しましょう。

総作業時間16時間ほどで完成させて意気揚々と打ったジャ〇ラーはとても楽しかったです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up