前提
本日のお題
4. ログ集計ツール
何を作る?
サーバログ風のテキストファイルから、1時間ごとのアクセス数などを集計するスクリプト。
学べること
- ファイル I/O (
pathlib,open) - 文字列処理・正規表現 (
re) - 辞書での集計(カウンタ)
面白いところ
- 自分のツールのログを実際に食わせられる
- フォーマットが違えば「ログパーサを追加する」プラグイン風に進化させられる
サーバログ風ファイルの作成
これは作ってもらいました。
04_server.log
203.0.113.5 - - [01/Mar/2025:09:00:12 +0900] "GET / HTTP/1.1" 200 512 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.168.0.10 - - [01/Mar/2025:09:01:03 +0900] "GET /static/style.css HTTP/1.1" 200 1024 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
198.51.100.23 - - [01/Mar/2025:09:01:25 +0900] "GET /static/app.js HTTP/1.1" 200 2048 "https://example.com/" "Mozilla/5.0 (X11; Linux x86_64)"
203.0.113.8 - - [01/Mar/2025:09:02:47 +0900] "GET /favicon.ico HTTP/1.1" 200 543 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.99 - - [01/Mar/2025:09:03:11 +0900] "GET /login HTTP/1.1" 200 1536 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X)"
10.0.0.5 - - [01/Mar/2025:09:03:58 +0900] "POST /api/login HTTP/1.1" 302 321 "https://example.com/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.5 - - [01/Mar/2025:09:04:03 +0900] "GET /dashboard HTTP/1.1" 200 4096 "https://example.com/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.20 - - [01/Mar/2025:09:05:44 +0900] "GET /api/items?page=1 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
203.0.113.21 - - [01/Mar/2025:09:06:02 +0900] "GET /api/items?page=2 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
198.51.100.23 - - [01/Mar/2025:09:10:15 +0900] "GET /images/logo.png HTTP/1.1" 200 8192 "https://example.com/" "Mozilla/5.0 (X11; Linux x86_64)"
192.0.2.10 - - [01/Mar/2025:09:12:01 +0900] "GET /unknown HTTP/1.1" 404 256 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.0.2.10 - - [01/Mar/2025:09:12:18 +0900] "GET /docs HTTP/1.1" 200 2048 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.23 - - [01/Mar/2025:09:15:45 +0900] "GET /api/items?page=3 HTTP/1.1" 500 128 "https://example.com/dashboard" "curl/7.81.0"
198.51.100.23 - - [01/Mar/2025:09:16:02 +0900] "GET /api/items?page=3 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
203.0.113.5 - - [01/Mar/2025:09:20:33 +0900] "GET /dashboard HTTP/1.1" 200 4096 "https://example.com/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.5 - - [01/Mar/2025:09:22:47 +0900] "GET /logout HTTP/1.1" 302 321 "https://example.com/dashboard" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.23 - - [01/Mar/2025:10:00:01 +0900] "GET / HTTP/1.1" 200 512 "-" "Mozilla/5.0 (X11; Linux x86_64)"
198.51.100.50 - - [01/Mar/2025:10:01:11 +0900] "GET /status HTTP/1.1" 200 128 "-" "curl/8.0.1"
192.168.0.10 - - [01/Mar/2025:10:03:21 +0900] "GET /static/style.css HTTP/1.1" 200 1024 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
192.168.0.10 - - [01/Mar/2025:10:03:22 +0900] "GET /static/app.js HTTP/1.1" 200 2048 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
203.0.113.8 - - [01/Mar/2025:10:05:40 +0900] "GET /api/items?page=1 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
203.0.113.8 - - [01/Mar/2025:10:05:48 +0900] "GET /api/items?page=2 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
203.0.113.22 - - [01/Mar/2025:10:06:03 +0900] "GET /api/items?page=3 HTTP/1.1" 500 128 "https://example.com/dashboard" "curl/7.81.0"
203.0.113.22 - - [01/Mar/2025:10:06:20 +0900] "GET /api/items?page=3 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
203.0.113.30 - - [01/Mar/2025:10:10:00 +0900] "GET /login HTTP/1.1" 200 1536 "-" "Mozilla/5.0 (Linux; Android 14)"
203.0.113.30 - - [01/Mar/2025:10:10:10 +0900] "POST /api/login HTTP/1.1" 401 256 "https://example.com/login" "Mozilla/5.0 (Linux; Android 14)"
203.0.113.30 - - [01/Mar/2025:10:10:32 +0900] "POST /api/login HTTP/1.1" 302 321 "https://example.com/login" "Mozilla/5.0 (Linux; Android 14)"
203.0.113.30 - - [01/Mar/2025:10:10:40 +0900] "GET /dashboard HTTP/1.1" 200 4096 "https://example.com/login" "Mozilla/5.0 (Linux; Android 14)"
192.0.2.50 - - [01/Mar/2025:10:15:09 +0900] "GET /images/banner.jpg HTTP/1.1" 200 16384 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.0.2.51 - - [01/Mar/2025:10:20:20 +0900] "GET /docs/api HTTP/1.1" 200 6144 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
203.0.113.10 - - [01/Mar/2025:11:00:01 +0900] "GET / HTTP/1.1" 200 512 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.10 - - [01/Mar/2025:11:00:05 +0900] "GET /static/style.css HTTP/1.1" 200 1024 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.10 - - [01/Mar/2025:11:00:06 +0900] "GET /static/app.js HTTP/1.1" 200 2048 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.77 - - [01/Mar/2025:11:05:32 +0900] "GET /status HTTP/1.1" 200 128 "-" "curl/7.68.0"
198.51.100.77 - - [01/Mar/2025:11:05:33 +0900] "GET /metrics HTTP/1.1" 200 4096 "-" "curl/7.68.0"
192.0.2.60 - - [01/Mar/2025:11:10:00 +0900] "GET /login HTTP/1.1" 200 1536 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X)"
192.0.2.60 - - [01/Mar/2025:11:10:09 +0900] "POST /api/login HTTP/1.1" 302 321 "https://example.com/login" "Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X)"
192.0.2.60 - - [01/Mar/2025:11:10:15 +0900] "GET /dashboard HTTP/1.1" 200 4096 "https://example.com/login" "Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X)"
192.0.2.61 - - [01/Mar/2025:11:15:45 +0900] "GET /api/items?page=1 HTTP/1.1" 200 2048 "https://example.com/dashboard" "Mozilla/5.0 (X11; Linux x86_64)"
192.0.2.61 - - [01/Mar/2025:11:15:55 +0900] "GET /api/items?page=2 HTTP/1.1" 200 2048 "https://example.com/dashboard" "Mozilla/5.0 (X11; Linux x86_64)"
192.0.2.61 - - [01/Mar/2025:11:16:05 +0900] "GET /api/items?page=99 HTTP/1.1" 404 256 "https://example.com/dashboard" "Mozilla/5.0 (X11; Linux x86_64)"
198.51.100.23 - - [01/Mar/2025:11:20:00 +0900] "GET /docs HTTP/1.1" 200 2048 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.200 - - [01/Mar/2025:12:00:01 +0900] "GET / HTTP/1.1" 200 512 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
203.0.113.200 - - [01/Mar/2025:12:00:04 +0900] "GET /static/style.css HTTP/1.1" 200 1024 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
203.0.113.200 - - [01/Mar/2025:12:00:05 +0900] "GET /static/app.js HTTP/1.1" 200 2048 "https://example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
198.51.100.5 - - [01/Mar/2025:12:02:30 +0900] "GET /login HTTP/1.1" 200 1536 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.5 - - [01/Mar/2025:12:02:41 +0900] "POST /api/login HTTP/1.1" 500 128 "https://example.com/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.5 - - [01/Mar/2025:12:03:01 +0900] "POST /api/login HTTP/1.1" 302 321 "https://example.com/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.5 - - [01/Mar/2025:12:03:10 +0900] "GET /dashboard HTTP/1.1" 200 4096 "https://example.com/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.0.2.70 - - [01/Mar/2025:12:05:55 +0900] "GET /api/items?page=1 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
192.0.2.70 - - [01/Mar/2025:12:06:05 +0900] "GET /api/items?page=2 HTTP/1.1" 200 2048 "https://example.com/dashboard" "curl/7.81.0"
192.0.2.71 - - [01/Mar/2025:12:10:10 +0900] "GET /docs/api HTTP/1.1" 200 6144 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.0.2.72 - - [01/Mar/2025:12:20:20 +0900] "GET /healthz HTTP/1.1" 200 64 "-" "kube-probe/1.30"
203.0.113.5 - - [02/Mar/2025:09:00:09 +0900] "GET / HTTP/1.1" 200 512 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.5 - - [02/Mar/2025:09:00:12 +0900] "GET /static/style.css HTTP/1.1" 200 1024 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.5 - - [02/Mar/2025:09:00:13 +0900] "GET /static/app.js HTTP/1.1" 200 2048 "https://example.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
198.51.100.23 - - [02/Mar/2025:09:02:30 +0900] "GET /login HTTP/1.1" 200 1536 "-" "Mozilla/5.0 (X11; Linux x86_64)"
198.51.100.23 - - [02/Mar/2025:09:02:40 +0900] "POST /api/login HTTP/1.1" 302 321 "https://example.com/login" "Mozilla/5.0 (X11; Linux x86_64)"
198.51.100.23 - - [02/Mar/2025:09:02:48 +0900] "GET /dashboard HTTP/1.1" 200 4096 "https://example.com/login" "Mozilla/5.0 (X11; Linux x86_64)"
192.0.2.10 - - [02/Mar/2025:09:05:05 +0900] "GET /unknown HTTP/1.1" 404 256 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
192.0.2.10 - - [02/Mar/2025:09:05:15 +0900] "GET /docs HTTP/1.1" 200 2048 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
203.0.113.8 - - [02/Mar/2025:10:00:01 +0900] "GET /status HTTP/1.1" 200 128 "-" "curl/7.81.0"
203.0.113.8 - - [02/Mar/2025:10:00:02 +0900] "GET /metrics HTTP/1.1" 200 4096 "-" "curl/7.81.0"
192.0.2.90 - - [02/Mar/2025:10:05:30 +0900] "GET /healthz HTTP/1.1" 200 64 "-" "kube-probe/1.30"
192.0.2.91 - - [02/Mar/2025:10:10:10 +0900] "GET /docs/api HTTP/1.1" 200 6144 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
回答
コード
04_log_collect.py
import sys
import re
from pathlib import Path
from datetime import datetime
from collections import defaultdict, Counter
# Apache/Nginx 風 access.log の 1 行をパースする正規表現
LOG_LINE_RE = re.compile(
r'^'
r'(?P<ip>\S+)\s+' # IP アドレス
r'\S+\s+\S+\s+' # - - (使わないのでまとめて無視)
r'\[(?P<datetime>[^\]]+)\]\s+' # [01/Mar/2025:09:00:12 +0900]
r'"(?P<method>\S+)\s+(?P<path>\S+)\s+(?P<protocol>[^"]+)"\s+' # "GET /path HTTP/1.1"
r'(?P<status>\d{3})\s+' # ステータスコード
r'(?P<size>\d+)\s+' # レスポンスサイズ
r'"(?P<referer>[^"]*)"\s+' # "Referer"
r'"(?P<user_agent>[^"]*)"' # "User-Agent"
r'$'
)
def parse_log_line(line: str):
"""1行のログをパースして dict を返す。マッチしなければ None。"""
m = LOG_LINE_RE.match(line)
if not m:
return None
d = m.groupdict()
# 日時文字列 → datetime オブジェクト
# 例: 01/Mar/2025:09:00:12 +0900
dt_str = d["datetime"]
dt = datetime.strptime(dt_str, "%d/%b/%Y:%H:%M:%S %z")
d["dt"] = dt
# 数値にしておく
d["status"] = int(d["status"])
d["size"] = int(d["size"])
# クエリ部分 (?page=1 など) を落として「パスだけ」にした例
path = d["path"]
if "?" in path:
path = path.split("?", 1)[0]
d["path_no_query"] = path
return d
def analyze_log(path: Path):
"""ログファイルを読み込んで集計するメイン関数。"""
if not path.exists():
print(f"[ERROR] File not found: {path}")
return
# 1時間ごとの全リクエスト数
requests_per_hour = Counter()
# 1時間ごとのステータスコード別カウント
status_per_hour = defaultdict(Counter)
# URL(クエリ除く)ごとのリクエスト数
path_counter = Counter()
# パース失敗行の数をカウント(デバッグ用)
parse_error_count = 0
with path.open("r", encoding="utf-8") as f:
for line in f:
line = line.rstrip("\n")
if not line:
continue
rec = parse_log_line(line)
if rec is None:
parse_error_count += 1
continue
dt = rec["dt"]
# 「1時間単位キー」例: 2025-03-01 09:00
hour_key = dt.strftime("%Y-%m-%d %H:00")
requests_per_hour[hour_key] += 1
status_per_hour[hour_key][rec["status"]] += 1
path_counter[rec["path_no_query"]] += 1
# 結果表示
print("=== 1時間ごとのリクエスト数 ===")
for hour in sorted(requests_per_hour.keys()):
total = requests_per_hour[hour]
print(f"{hour} total={total:3d}", end="")
# ステータスコード別も表示
for status, cnt in sorted(status_per_hour[hour].items()):
print(f" {status}={cnt}", end="")
print()
print()
print("=== よくアクセスされているパス TOP 5 (クエリ除く) ===")
for path, cnt in path_counter.most_common(5):
print(f"{cnt:3d} {path}")
if parse_error_count:
print()
print(f"[INFO] パースに失敗した行数: {parse_error_count}")
def main():
if len(sys.argv) < 2:
print("Usage: python analyze_log.py sample_access.log")
sys.exit(1)
log_path = Path(sys.argv[1])
analyze_log(log_path)
if __name__ == "__main__":
main()
実行例
$python 04_log_collect.py 04_server.log
=== 1時間ごとのリクエスト数 ===
2025-03-01 09:00 total= 16 200=12 302=2 404=1 500=1
2025-03-01 10:00 total= 14 200=11 302=1 401=1 500=1
2025-03-01 11:00 total= 12 200=10 302=1 404=1
2025-03-01 12:00 total= 11 200=9 302=1 500=1
2025-03-02 09:00 total= 8 200=6 302=1 404=1
2025-03-02 10:00 total= 4 200=4
=== よくアクセスされているパス TOP 5 (クエリ除く) ===
13 /api/items
7 /api/login
6 /dashboard
5 /
5 /static/style.css
$python 04_log_collect.py XXX.log
[ERROR] File not found: 04_server.lo
今回はコードも禁じられた機械を使用して作ってもらいました…。
教えはどうなっているんだ教えは!!!
感想
- 正規表現でkey名を決めてdictとして値を取得できるのは初めてしたので勉強になった
- collectinsのCounterも存在を初めて知ってへぇ~となった
- コードを作成してもらうと自分が知らなかったことが出てきて学びにつながるのはGoodだと感じる(一方で1から書く力は伸びないとか、コードの質に責任持てるのか?とかはある)
またひとつ賢くなりました(錯覚)