0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Pandasを始めました。(その2)

Posted at

セキュリティログを扱っていると、時系列やIPアドレスごとにアクセス量を把握したくなることがあります。
本記事では、Pandasを使ってそうしたログデータを集計・可視化する方法を、自分の学習記録として整理しました。

サンプルデータ

サンプルデータは、ファイアウォールの通信ログを模したもので、以下の情報を含みます:

  • timestamp: 通信時刻
  • src_ip / dst_ip: 送信元 / 送信先IP
  • src_port / dst_port: ポート番号
  • protocol: TCPまたはUDP
  • action: ALLOW または DENY
  • bytes_sent: 送信バイト数

今後はこのデータを対象にして、データを集計します。

data = [
    {"timestamp": "2025-08-06 00:18:05", "src_ip": "192.168.10.1", "dst_ip": "192.168.20.1", "src_port": 64786, "dst_port": 53, "protocol": "UDP", "action": "ALLOW", "bytes_sent": 4037},
    {"timestamp": "2025-08-06 01:31:05", "src_ip": "192.168.10.1", "dst_ip": "192.168.20.10", "src_port": 65278, "dst_port": 445, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 4923},
    {"timestamp": "2025-08-06 01:35:05", "src_ip": "192.168.10.1", "dst_ip": "192.168.20.1", "src_port": 15789, "dst_port": 389, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 3563},
    {"timestamp": "2025-08-06 03:44:05", "src_ip": "192.168.10.2", "dst_ip": "192.168.20.2", "src_port": 16274, "dst_port": 80, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 3844},
    {"timestamp": "2025-08-06 04:42:05", "src_ip": "192.168.10.2", "dst_ip": "192.168.20.2", "src_port": 26528, "dst_port": 443, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 2117},
    {"timestamp": "2025-08-06 05:12:05", "src_ip": "192.168.10.2", "dst_ip": "192.168.20.1", "src_port": 36890, "dst_port": 389, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 1303},
    {"timestamp": "2025-08-06 06:09:05", "src_ip": "192.168.10.2", "dst_ip": "192.168.20.2", "src_port": 19615, "dst_port": 80, "protocol": "TCP", "action": "DENY", "bytes_sent": 305},
    {"timestamp": "2025-08-06 07:15:05", "src_ip": "192.168.10.3", "dst_ip": "192.168.20.1", "src_port": 49622, "dst_port": 53, "protocol": "UDP", "action": "DENY", "bytes_sent": 2046},
    {"timestamp": "2025-08-06 07:18:05", "src_ip": "192.168.10.3", "dst_ip": "192.168.20.10", "src_port": 37805, "dst_port": 445, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 4504},
    {"timestamp": "2025-08-06 07:25:05", "src_ip": "192.168.10.2", "dst_ip": "192.168.20.1", "src_port": 7164, "dst_port": 53, "protocol": "UDP", "action": "ALLOW", "bytes_sent": 1945},
    {"timestamp": "2025-08-06 09:53:05", "src_ip": "192.168.10.1", "dst_ip": "192.168.20.1", "src_port": 45372, "dst_port": 389, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 4877},
    {"timestamp": "2025-08-06 10:06:05", "src_ip": "192.168.10.3", "dst_ip": "192.168.20.2", "src_port": 33116, "dst_port": 443, "protocol": "TCP", "action": "DENY", "bytes_sent": 1683},
    {"timestamp": "2025-08-06 10:25:05", "src_ip": "192.168.10.3", "dst_ip": "192.168.20.10", "src_port": 35246, "dst_port": 445, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 275},
    {"timestamp": "2025-08-06 11:18:05", "src_ip": "192.168.10.3", "dst_ip": "192.168.20.1", "src_port": 26318, "dst_port": 389, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 4686},
    {"timestamp": "2025-08-06 14:05:05", "src_ip": "192.168.10.3", "dst_ip": "192.168.20.1", "src_port": 31357, "dst_port": 53, "protocol": "UDP", "action": "ALLOW", "bytes_sent": 3247},
    {"timestamp": "2025-08-06 18:57:05", "src_ip": "192.168.10.1", "dst_ip": "192.168.20.2", "src_port": 62117, "dst_port": 443, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 4740},
    {"timestamp": "2025-08-06 19:27:05", "src_ip": "192.168.10.3", "dst_ip": "192.168.20.10", "src_port": 26367, "dst_port": 445, "protocol": "TCP", "action": "DENY", "bytes_sent": 1231},
    {"timestamp": "2025-08-06 20:10:05", "src_ip": "192.168.10.2", "dst_ip": "192.168.20.2", "src_port": 43184, "dst_port": 443, "protocol": "TCP", "action": "ALLOW", "bytes_sent": 1557},
    {"timestamp": "2025-08-06 20:44:05", "src_ip": "192.168.10.1", "dst_ip": "192.168.20.2", "src_port": 61981, "dst_port": 80, "protocol": "TCP", "action": "DENY", "bytes_sent": 1894},
    {"timestamp": "2025-08-06 21:43:05", "src_ip": "192.168.10.2", "dst_ip": "192.168.20.1", "src_port": 39642, "dst_port": 389, "protocol": "TCP", "action": "DENY", "bytes_sent": 93},
]
df = pd.DataFrame(data)
df["timestamp"] = pd.to_datetime(df["timestamp"])

列の合計

全体でどれだけのデータが送信されたかを確認するため、bytes_sentの列の合計値を出力します。

> print(df['bytes_sent'].sum())
52870

グループごとの合計

各個の通信量を比較するために、src_ipdst_ipの組み合わせ毎に合計を出します。

> df.groupby(["src_ip","dst_ip"])["bytes_sent"].sum()
src_ip        dst_ip
192.168.10.1  192.168.20.1     12477
              192.168.20.10     4923
              192.168.20.2      6634
192.168.10.2  192.168.20.1      3341
              192.168.20.2      7823
192.168.10.3  192.168.20.1      9979
              192.168.20.10     6010
              192.168.20.2      1683
Name: bytes_sent, dtype: int64

時間帯別の通信量を把握するために、2時間毎のbytes_sentの合計を出力します。

> df["time_2h"] = df["timestamp"].dt.floor("2H")
> print(df.groupby("time_2h")["bytes_sent"].sum())
time_2h
2025-08-06 00:00:00    12523
2025-08-06 02:00:00     3844
2025-08-06 04:00:00     3420
2025-08-06 06:00:00     8800
2025-08-06 08:00:00     4877
2025-08-06 10:00:00     6644
2025-08-06 14:00:00     3247
2025-08-06 18:00:00     5971
2025-08-06 20:00:00     3544
Name: bytes_sent, dtype: int64

データ件数を数える

データ量ではなくパケットの数を集計します。これも2時間毎に集計します。

> df["time_2h"] = df["timestamp"].dt.floor("2H")
> print(df.groupby("time_2h").size())
time_2h
2025-08-06 00:00:00    3
2025-08-06 02:00:00    1
2025-08-06 04:00:00    2
2025-08-06 06:00:00    4
2025-08-06 08:00:00    1
2025-08-06 10:00:00    3
2025-08-06 14:00:00    1
2025-08-06 18:00:00    2
2025-08-06 20:00:00    3
dtype: int64

ランキング

通信が失敗する端末を調べるため、actionDENYの件数が多いランキングを出力します。

> df[df["action"] == "DENY"].groupby("src_ip").size().sort_values(ascending=False)
src_ip
192.168.10.3    3
192.168.10.2    2
192.168.10.1    1
dtype: int64

グラフの表示

matplotlibを使えば、すぐにグラフを表示できます。

import matplotlib.pyplot as plt
import matplotlib

matplotlib.rcParams['font.family'] = 'MS Gothic'      # 日本語フォント

df["time_2h"] = df["timestamp"].dt.floor("2H")
sent_bytes_by_2h = df.groupby("time_2h")["bytes_sent"].sum()

sent_bytes_by_2h.plot(
    kind="bar",
    title="2時間ごとの送信データ量",
    xlabel="時刻",
    ylabel="送信バイト数",
    rot=90
)
plt.tight_layout()
plt.show()

sent_bytes_by_2h.png

あるホストへの通信を最初に行った時間

192.168.20.10宛で、各src_ipが最初にアクセスした時間を表示する。

> df[df['dst_ip']=='192.168.20.10'].sort_values("timestamp").groupby("src_ip").first().reset_index()
         src_ip           timestamp         dst_ip  src_port  dst_port protocol action  bytes_sent
0  192.168.10.1 2025-08-06 01:31:05  192.168.20.10     65278       445      TCP  ALLOW        4923
1  192.168.10.3 2025-08-06 07:18:05  192.168.20.10     37805       445      TCP  ALLOW        4504

まとめ

今回は基本的な集計と可視化に焦点を当てましたが、次回は異常検知や時系列処理など、より高度な分析にも触れていきます。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?