1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

JupyterLabでログを可視化

1
Last updated at Posted at 2019-12-15

やりたいこと

  • WebSphere Libertyのアクセス・ログから応答時間をグラフ化する
  • アクセス・ログが複数に分割されていても読み込む
  • パーセンタイル

Jupyter Labで作った結果

import pandas as pd
from datetime import datetime, tzinfo
import matplotlib.pyplot as plt
import glob
%matplotlib inline

# 指定したディレクトリ以下の複数のファイルを読み込む
path = r'.'
all_files = glob.glob(path + "/access*.log")
li = []

# 各ファイルからDataFrameを作る
for filename in all_files:
    df = pd.read_csv(filename, names=['start_time','remote_host','method','url','user_agent','status_code','response_byte','elapsed_time'])
    li.append(df)

# DataFrameを結合する
df = pd.concat(li, axis=0, ignore_index=True)

# Timezone部分を削除してTimestamp型へ変換する関数
def to_timestamp(str):
    # タイムゾーン(+0900)の部分が%zではうまくパースできなかったのでとりあえず消す
    str=str.replace(' +0900', '')
    return pd.Timestamp(datetime.strptime(str, '[%d/%b/%Y:%H:%M:%S]'))

# Timestamp型へ変換
df['start_time']=df['start_time'].map(to_timestamp)


# 応答時間部分の統計情報
print(df['elapsed_time'].describe())
print(df['elapsed_time'].quantile(0.9))

# グラフ化
df = df.set_index('start_time')
plt.scatter(df.index, df['elapsed_time']/1000)

# 自動で計算してくれるはずだが、うまく動かない場合に指定
# 最大値・最小値を取得する
start_time_max = df['start_time'].max()
start_time_min = df['start_time'].min()
# plt.xlim(start_time_min,start_time_max)

plt.xticks(rotation=70)
plt.xlabel('start time')
plt.ylabel('elapsed time[ms]')
plt.show()
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?