More than 3 years have passed since last update.

【Python】文字列中に'a'から'z'までの文字がそれぞれ何個ずつあるかを取得する。

Last updated at 2021-12-11Posted at 2021-12-11

Paizaなんかをやっていると、結構よく使う気がします。
出現しなかった文字は0になる形で結果を得たいというのがポイントです。
forループを使えば簡単に書けるタスクではありますが、辞書の代わりにPandas.Seriesを使い、標準機能のcollections.Counterを用いることでも簡潔に書くことができます。

結論

以下の通り。

labels = list(string.ascii_lowercase)  # カウントする文字のリスト
atoz_dict = pd.Series(  # Seriesを初期化
    np.repeat(0, len(labels)),
    index=labels
)
counts = Counter(text)  # 文字列・リスト中のユニークな要素と出現数の辞書
atoz_dict[counts] = counts  # pd.Seriesならこんな形で代入ができる。辞書だとできない

print(atoz_dict)
# a     5
# b     3
# c     1
# 　︙
# x     0
# y     2
# z     0

ちょっとした補足

インポートするもの

pandas.Series, collections.Counterが必要。

変数`labels` 「カウントする文字のリスト」

辞書でいう見出し・キー、pandas.Seriesでいうところのインデックスに当たる。

調査する文字列内に、このリストにない文字があるとエラーになる。

labels = list(string.ascii_lowercase)  # カウントする文字のリスト
# ['a', 'b', 'c', ..., 'x', 'y', 'z', ' ']

`Counter`クラス

文字列・リストを渡して初期化すると、ユニークな要素とそれらの出現数の辞書の形が得られる。

ただし通常の辞書と違い、辞書内にないキーを指定した場合0が返ってくる。

counts = Counter("abcab")
counts['a']  # 2
counts['z']  # 0

`pandas.Series`への代入

以下のように１行で書くことができる。

atoz_dict[counts] = counts

この書き方ができるのはpandas.Seriesであるからであり、辞書に対してこのような代入はできない。

Seriesが扱いづらい、という人は最後にSeriesを辞書に変換してください。

atoz_dict = dict(atoz_dict)
print(atoz_dict)
# {'a': 5, 'b': 3, ...,'x': 0, 'y': 2, 'z': 0}

コード全文

import numpy as np
import pandas as pd
from collections import Counter
import string

text = "alice was beginning to get very tired of sitting by her sister on the bank and of having nothing to do"
text = text.replace(' ', '')  # 空白を削除

labels = list(string.ascii_lowercase)  # カウントする文字のリスト
# ['a', 'b', 'c', ..., 'x', 'y', 'z', ' ']

atoz_dict = pd.Series(  # Seriesを初期化
        np.repeat(0, len(labels)),  # np.zeros()だとデフォルトで小数型になるのでnp.repeat()を使用
        index=labels  # 辞書で言うところのkeyにあたる
)
counts = Counter(text)  # 文字列・リスト中のユニークな要素の数を辞書で返す。
atoz_dict[counts] = counts

print(atoz_dict)
# a     5
# b     3
# c     1
# 　︙
# x     0
# y     2
# z     0

# pandas.Seriesが扱いづらければ辞書に変換
atoz_dict = dict(atoz_dict)
print(atoz_dict)
# {'a': 5, 'b': 3, ...,'x': 0, 'y': 2, 'z': 0}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up