0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

python(25) Runtime Error(118) 言語処理100本ノック:79 未解決

Last updated at Posted at 2019-01-23

言語処理100本ノック 2015

79. 適合率-再現率グラフの描画

http://www.cl.ecei.tohoku.ac.jp/nlp100/
「アーティスト情報(artist.json.gz)をデータベースに登録せよ.さらに,次のフィールドでインデックスを作成せよ: name, aliases.name, tags.value, rating.value.」
素人の言語処理100本ノック:79
https://qiita.com/segavvy/items/8f93187ec89f4831d863

# ./p79.py
Traceback (most recent call last):
  File "./p79.py", line 105, in <module>
    plt.plot(thresholds, accuracys, color='green', linestyle='--', label='正解率')
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py", line 3352, in plot
    ax = gca()
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py", line 969, in gca
    return gcf().gca(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py", line 586, in gcf
    return figure()
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py", line 533, in figure
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 161, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 167, in new_figure_manager_given_figure
    canvas = cls.FigureCanvas(figure)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_qt5agg.py", line 24, in __init__
    super(FigureCanvasQTAgg, self).__init__(figure=figure)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_qt5.py", line 234, in __init__
    _create_qApp()
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_qt5.py", line 125, in _create_qApp
    raise RuntimeError('Invalid DISPLAY variable')
RuntimeError: Invalid DISPLAY variable

ソースは下記(コマンドとして実行したく1行目追記)

#!/usr/bin/env python
# coding: utf-8

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties

fname_result = 'result.txt'
fname_work = 'work.txt'


def score(fname):
    '''結果ファイルからスコア算出
    結果ファイルを読み込んで、正解率、適合率、再現率、F1スコアを返す

    戻り値:
    正解率,適合率,再現率,F1スコア
    '''
    # 結果を読み込んで集計
    TP = 0      # True-Positive     予想が+1、正解も+1
    FP = 0      # False-Positive    予想が+1、正解は-1
    FN = 0      # False-Negative    予想が-1、正解は+1
    TN = 0      # True-Negative     予想が-1、正解も-1

    with open(fname) as data_file:
        for line in data_file:
            cols = line.split('\t')

            if len(cols) < 3:
                continue

            if cols[0] == '+1':         # 正解
                if cols[1] == '+1':     # 予想
                    TP += 1
                else:
                    FN += 1
            else:
                if cols[1] == '+1':
                    FP += 1
                else:
                    TN += 1

    # 算出
    accuracy = (TP + TN) / (TP + FP + FN + TN)      # 正解率
    precision = TP / (TP + FP)      # 適合率
    recall = TP / (TP + FN)     # 再現率
    f1 = (2 * recall * precision) / (recall + precision)    # F1スコア

    return accuracy, precision, recall, f1


# 結果読み込み、予測確率は元の値(仮説関数hypothesis()の値)に戻す
results = []
with open(fname_result) as data_file:
    for line in data_file:

        cols = line.split('\t')
        if len(cols) < 3:
            continue

        # 正解ラベル
        label = cols[0]

        # 識別関数predict()の値
        if cols[1] == '-1':
            predict = 1.0 - float(cols[2])      # 確率を戻す
        else:
            predict = float(cols[2])

        results.append((label, predict))

# 閾値を変えながらスコア算出、グラフ描画用の配列へセット
thresholds = []
accuracys = []
precisions = []
recalls = []
f1s = []
for threshold in np.arange(0.02, 1.0, 0.02):

    # score()を使うため、一時ファイルに結果保存
    with open(fname_work, 'w') as file_out:
        for label, predict in results:
            if predict > threshold:
                file_out.write('{}\t{}\t{}\n'.format(label, '+1', predict))
            else:
                file_out.write('{}\t{}\t{}\n'.format(label, '-1', 1 - predict))

    # スコア算出
    accuracy, precision, recall, f1 = score(fname_work)

    # 結果追加
    thresholds.append(threshold)
    accuracys.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1s.append(f1)


# グラフで使うフォント情報(デフォルトのままでは日本語が表示できない)
fp = FontProperties(
    fname='/Library/Fonts/Times New Roman Bold Italic.ttf'
)

# 折線グラフの値の設定
plt.plot(thresholds, accuracys, color='green', linestyle='--', label='正解率')
plt.plot(thresholds, precisions, color='red', linewidth=3, label='適合率')
plt.plot(thresholds, recalls, color='blue', linewidth=3, label='再現率')
plt.plot(thresholds, f1s, color='magenta', linestyle='--', label='F1スコア')

# 軸の値の範囲の調整
plt.xlim(
    xmin=0, xmax=1.0
)
plt.ylim(
    ymin=0, ymax=1.0
)

# グラフのタイトル、ラベル指定
plt.title(
    '79. 適合率-再現率グラフの描画',    # タイトル
    fontproperties=fp   # 使うフォント情報
)
plt.xlabel(
    'ロジスティック回帰モデルの分類の閾値',       # x軸ラベル
    fontproperties=fp   # 使うフォント情報
)
plt.ylabel(
    '精度',         # y軸ラベル
    fontproperties=fp   # 使うフォント情報
)

# グリッドを表示
plt.grid(axis='both')

# 凡例表示
plt.legend(loc='lower left', prop=fp)

# 表示
plt.show()

先頭と途中と最後を加筆。

#!/usr/bin/env python
# coding: utf-8

import numpy as np
import matplotlib as mpl
mpl.use('Agg')
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties

fname_result = 'result.txt'
fname_work = 'work.txt'


def score(fname):
    '''結果ファイルからスコア算出
    結果ファイルを読み込んで、正解率、適合率、再現率、F1スコアを返す

    戻り値:
    正解率,適合率,再現率,F1スコア
    '''
    # 結果を読み込んで集計
    TP = 0      # True-Positive     予想が+1、正解も+1
    FP = 0      # False-Positive    予想が+1、正解は-1
    FN = 0      # False-Negative    予想が-1、正解は+1
    TN = 0      # True-Negative     予想が-1、正解も-1

    with open(fname) as data_file:
        for line in data_file:
            cols = line.split('\t')

            if len(cols) < 3:
                continue

            if cols[0] == '+1':         # 正解
                if cols[1] == '+1':     # 予想
                    TP += 1
                else:
                    FN += 1
            else:
                if cols[1] == '+1':
                    FP += 1
                else:
                    TN += 1

    # 算出
    accuracy = (TP + TN) / (TP + FP + FN + TN)      # 正解率
    precision = TP / (TP + FP)      # 適合率
    recall = TP / (TP + FN)     # 再現率
    f1 = (2 * recall * precision) / (recall + precision)    # F1スコア

    return accuracy, precision, recall, f1


# 結果読み込み、予測確率は元の値(仮説関数hypothesis()の値)に戻す
results = []
with open(fname_result) as data_file:
    for line in data_file:

        cols = line.split('\t')
        if len(cols) < 3:
            continue

        # 正解ラベル
        label = cols[0]

        # 識別関数predict()の値
        if cols[1] == '-1':
            predict = 1.0 - float(cols[2])      # 確率を戻す
        else:
            predict = float(cols[2])

        results.append((label, predict))

# 閾値を変えながらスコア算出、グラフ描画用の配列へセット
fig = plt.figure()

thresholds = []
accuracys = []
precisions = []
recalls = []
f1s = []
for threshold in np.arange(0.02, 1.0, 0.02):

    # score()を使うため、一時ファイルに結果保存
    with open(fname_work, 'w') as file_out:
        for label, predict in results:
            if predict > threshold:
                file_out.write('{}\t{}\t{}\n'.format(label, '+1', predict))
            else:
                file_out.write('{}\t{}\t{}\n'.format(label, '-1', 1 - predict))

    # スコア算出
    accuracy, precision, recall, f1 = score(fname_work)

    # 結果追加
    thresholds.append(threshold)
    accuracys.append(accuracy)
    precisions.append(precision)
    recalls.append(recall)
    f1s.append(f1)


# グラフで使うフォント情報(デフォルトのままでは日本語が表示できない)
fp = FontProperties(
    fname='/Library/Fonts/Times New Roman Bold Italic.ttf'
)

# 折線グラフの値の設定
plt.plot(thresholds, accuracys, color='green', linestyle='--', label='正解率')
plt.plot(thresholds, precisions, color='red', linewidth=3, label='適合率')
plt.plot(thresholds, recalls, color='blue', linewidth=3, label='再現率')
plt.plot(thresholds, f1s, color='magenta', linestyle='--', label='F1スコア')

# 軸の値の範囲の調整
plt.xlim(
    xmin=0, xmax=1.0
)
plt.ylim(
    ymin=0, ymax=1.0
)

# グラフのタイトル、ラベル指定
plt.title(
    '79. 適合率-再現率グラフの描画',    # タイトル
    fontproperties=fp   # 使うフォント情報
)
plt.xlabel(
    'ロジスティック回帰モデルの分類の閾値',       # x軸ラベル
    fontproperties=fp   # 使うフォント情報
)
plt.ylabel(
    '精度',         # y軸ラベル
    fontproperties=fp   # 使うフォント情報
)

# グリッドを表示
plt.grid(axis='both')

# 凡例表示
plt.legend(loc='lower left', prop=fp)

# 表示
#plt.show()
fig.savefig('p79.png')

# ./p79.py
Traceback (most recent call last):
  File "./p79.py", line 144, in <module>
    fig.savefig('p79.png')
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py", line 2062, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 2263, in print_figure
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py", line 517, in print_png
    FigureCanvasAgg.draw(self)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py", line 437, in draw
    self.figure.draw(self.renderer)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py", line 55, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py", line 1493, in draw
    renderer, self, artists, self.suppressComposite)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/image.py", line 141, in _draw_list_compositing_images
    a.draw(renderer)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py", line 55, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/axes/_base.py", line 2635, in draw
    mimage._draw_list_compositing_images(renderer, self, artists)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/image.py", line 141, in _draw_list_compositing_images
    a.draw(renderer)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py", line 55, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/axis.py", line 1204, in draw
    self.label.draw(renderer)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/artist.py", line 55, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/text.py", line 706, in draw
    bbox, info, descent = textobj._get_layout(renderer)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/text.py", line 300, in _get_layout
    ismath=False)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py", line 245, in get_text_width_height_descent
    font = self._get_agg_font(prop)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py", line 280, in _get_agg_font
    font = get_font(fname)
  File "/opt/conda/lib/python3.7/site-packages/matplotlib/font_manager.py", line 1389, in get_font
    return _get_font(filename, hinting_factor)
FileNotFoundError: [Errno 2] No such file or directory: '/Library/Fonts/Times New Roman Bold Italic.ttf'

fontを注釈にすると

# ./p79.py
Traceback (most recent call last):
  File "./p79.py", line 125, in <module>
    fontproperties=fp   # 使うフォント情報
NameError: name 'fp' is not defined

docker側のfont指定をしないといけないのかも。

一覧

物理記事 上位100
https://qiita.com/kaizen_nagoya/items/66e90fe31fbe3facc6ff

量子(0) 計算機, 量子力学
https://qiita.com/kaizen_nagoya/items/1cd954cb0eed92879fd4

数学関連記事100
https://qiita.com/kaizen_nagoya/items/d8dadb49a6397e854c6d

言語・文学記事 100
https://qiita.com/kaizen_nagoya/items/42d58d5ef7fb53c407d6

医工連携関連記事一覧
https://qiita.com/kaizen_nagoya/items/6ab51c12ba51bc260a82

自動車 記事 100
https://qiita.com/kaizen_nagoya/items/f7f0b9ab36569ad409c5

通信記事100
https://qiita.com/kaizen_nagoya/items/1d67de5e1cd207b05ef7

日本語(0)一欄
https://qiita.com/kaizen_nagoya/items/7498dcfa3a9ba7fd1e68

英語(0) 一覧
https://qiita.com/kaizen_nagoya/items/680e3f5cbf9430486c7d

転職(0)一覧
https://qiita.com/kaizen_nagoya/items/f77520d378d33451d6fe

仮説(0)一覧(目標100現在40)
https://qiita.com/kaizen_nagoya/items/f000506fe1837b3590df

Qiita(0)Qiita関連記事一覧(自分)
https://qiita.com/kaizen_nagoya/items/58db5fbf036b28e9dfa6

鉄道(0)鉄道のシステム考察はてっちゃんがてつだってくれる
https://qiita.com/kaizen_nagoya/items/26bda595f341a27901a0

安全(0)安全工学シンポジウムに向けて: 21
https://qiita.com/kaizen_nagoya/items/c5d78f3def8195cb2409

一覧の一覧( The directory of directories of mine.) Qiita(100)
https://qiita.com/kaizen_nagoya/items/7eb0e006543886138f39

Ethernet 記事一覧 Ethernet(0)
https://qiita.com/kaizen_nagoya/items/88d35e99f74aefc98794

Wireshark 一覧 wireshark(0)、Ethernet(48)
https://qiita.com/kaizen_nagoya/items/fbed841f61875c4731d0

線網(Wi-Fi)空中線(antenna)(0) 記事一覧(118/300目標)
https://qiita.com/kaizen_nagoya/items/5e5464ac2b24bd4cd001

OSEK OS設計の基礎 OSEK(100)
https://qiita.com/kaizen_nagoya/items/7528a22a14242d2d58a3

Error一覧 error(0)
https://qiita.com/kaizen_nagoya/items/48b6cbc8d68eae2c42b8

++ Support(0) 
https://qiita.com/kaizen_nagoya/items/8720d26f762369a80514

Coding(0) Rules, C, Secure, MISRA and so on
https://qiita.com/kaizen_nagoya/items/400725644a8a0e90fbb0

プログラマによる、プログラマのための、統計(0)と確率のプログラミングとその後
https://qiita.com/kaizen_nagoya/items/6e9897eb641268766909

なぜdockerで機械学習するか 書籍・ソース一覧作成中 (目標100)
https://qiita.com/kaizen_nagoya/items/ddd12477544bf5ba85e2

言語処理100本ノックをdockerで。python覚えるのに最適。:10+12
https://qiita.com/kaizen_nagoya/items/7e7eb7c543e0c18438c4

プログラムちょい替え(0)一覧:4件
https://qiita.com/kaizen_nagoya/items/296d87ef4bfd516bc394

官公庁・学校・公的団体(NPOを含む)システムの課題、官(0)
https://qiita.com/kaizen_nagoya/items/04ee6eaf7ec13d3af4c3

「はじめての」シリーズ  ベクタージャパン 
https://qiita.com/kaizen_nagoya/items/2e41634f6e21a3cf74eb

AUTOSAR(0)Qiita記事一覧, OSEK(75)
https://qiita.com/kaizen_nagoya/items/89c07961b59a8754c869

プログラマが知っていると良い「公序良俗」
https://qiita.com/kaizen_nagoya/items/9fe7c0dfac2fbd77a945

LaTeX(0) 一覧 
https://qiita.com/kaizen_nagoya/items/e3f7dafacab58c499792

自動制御、制御工学一覧(0)
https://qiita.com/kaizen_nagoya/items/7767a4e19a6ae1479e6b

Rust(0) 一覧 
https://qiita.com/kaizen_nagoya/items/5e8bb080ba6ca0281927

小川清最終講義、最終講義(再)計画, Ethernet(100) 英語(100) 安全(100)
https://qiita.com/kaizen_nagoya/items/e2df642e3951e35e6a53

<この記事は個人の過去の経験に基づく個人の感想です。現在所属する組織、業務とは関係がありません。>
This article is an individual impression based on the individual's experience. It has nothing to do with the organization or business to which I currently belong.

文書履歴(document history)

ver. 0.01 初稿  20240609

最後までおよみいただきありがとうございました。

いいね 💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚 and follow me for your happy life.

0
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?