More than 3 years have passed since last update.

Python[List/Dict] - スライス、ストライド、catch-allアンパック、ソート

Posted at 2021-02-10

シーケンスのスライス

Pythonには、シーケンスをスライスする構文があります。スライスをすることにより要素の部分集合に最小限の労力でアクセスできます。list, str, byteという組み込み関数は簡単にスライスできます。

a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
print('Middle two: ', a[3:5])
print('All but ends: ', a[1:7])
>>>
Middle two:  ['d', 'e']
All but ends:  ['b', 'c', 'd', 'e', 'f', 'g']

末尾までスライスする時には、末尾のインデックスは冗長なので省きましょう。

print('To end: ', a[5:])
>>>
To end: ['f', 'g', 'h']

スライスでは、りすんとを超えたstart, endも欠損した要素を無視することで適切に扱われます。
また、インデックスに直接アクセスしようとすると例外が発生します。

print('first_twenty_items: ', a[:20])
print('last_twenty_items: ', a[-20:])
print(a[20])
>>>
first_twenty_items:  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
last_twenty_items:  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Traceback (most recent call last):
  File "index.py", line 8, in <module>
    print(a[20])
IndexError: list index out of range

リストをスライスして得られた結果は、全く新しいリストです。元のリストのオブジェクトへの参照のままです。スライスした結果を修正しても元のリストは変更されません。

b = a[:3]
print('Before: ', b)
b[1] = 99
print('After: ', b)
print('No change: ', a)
>>>
Before:  ['a', 'b', 'c']
After:  ['a', 99, 'c']
No change:  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

代入をすると、スライスは元のリストの指定範囲を置き換えます。代入する際にスライスの長さは同じでなくても問題ありません。代入の前後でスライスの値そのものは変わりません。この場合、リストは置き換える値が指定スライスより短いので、短くなります。
逆に代入リストが長いと、リストが長くなります。

print('Before: ', a)
a[2:7] = [99, 22, 14]
print('After: ', a)
// 代入リストが長い場合
print('Before: ', a)
a[2:3] = [99, 22, 14]
print('After: ', a)
>>>
Before:  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
After:  ['a', 'b', 99, 22, 14, 'h']
// 代入リストが長い場合
Before:  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
After:  ['a', 'b', 99, 22, 14, 'd', 'e', 'f', 'g', 'h']

startもendもないスライスに代入を行うと、参照していたリストの複製を使って内容が起き変わります。

b = a
print('Before a', a)
print('Before b', b)

a[:] = [101, 102, 103]
assert a is b
print('After a', a)
print('After b', b)
>>>
Before a ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Before b ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
After a [101, 102, 103]
After b [101, 102, 103]

ストライドとスライス

Pythonにはスライスの他に,someone[startstride]という形式でスライスの増分（stride）を規定する構文があります。これを使えば、シーケンスをスライスする時に、n番目ごとに要素を取り出せます。

x = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']
odds = x[::2]
evends = x[1::2]
print(odds)
print(evends)
>>>
['red', 'yellow', 'blue']
['orange', 'green', 'purple']

ただ問題があります。このストライド構文はバグをもたらす予期せぬ振る舞いをすることがあります。
バイト列、Unicode文字列を逆転するには、strideを"-1"にしてスライスします。
ただし、UTF-8バイト文字列で符号化したデータはエラーとなります。

x = b'Effective Python'
y = x[::-1]
print(y)
w = 'テスト'
x = w.encode('utf-8')
y = x[::-1]
z = y.decode('utf-8')
print(z)
>>>
b'nohtyP evitceffE'
Traceback (most recent call last):
  File "index.py", line 15, in <module>
    z = y.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte

x = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
x[::2]      # ['a', 'c', 'e', 'g']
x[::-2]     # ['h', 'f', 'd', 'b']
x[2::2]     # ['c', 'e', 'g']
x[-2::-2]   # ['g', 'e', 'c', 'a']
x[-2:2:-2]  # ['g', 'e']
x[2:2:-2]   # []

start, end, strideを同時に使用すると複雑になり非常に紛らわしくなってしまいます。なので、start, end, strideを一緒に使用しないようにしましょう。strideを使用しなければならなければ、わかりやすく正の値にして使用するといいでしょう。
他にもストライドとスライスを両方使用する際にはitertoolsのisliceを使用することも検討しましょう。

catch-allアンパック

アンパクとしての基本的な制御として、前もってアンパックするシーケンスの長さが必要です。あるリストの先頭２つをあんパックしようとしたら実行時エラーが起こります。

car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
oldest, second_oldest = car_ages_descending
print(oldest, second_oldest)
>>>
Traceback (most recent call last):
  File "index.py", line 3, in <module>
    oldest, second_oldest = car_ages_descending
ValueError: too many values to unpack (expected 2)

Pythonにはこういった時に、アスタリスク付きの引数によるcatch-allアンパックがあります。この構文では、アンパック代入において、他のアンパックパターンに合致しない残り全てを受け取ることができます。

car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
oldest, second_oldest, *others = car_ages_descending
print(oldest, second_oldest)
print(others)
>>>
20 19
[15, 9, 8, 7, 6, 4, 1, 0]

しかし、アスタリスク付きを含むアンパック代入では、少なくとも1つ指定部分が必要です。そうでないと、'Syntaxerror'になります。また、全部をcatch-allアンパックで受け取ることはできません。

car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
*others = car_ages_descending
print(others)
>>>
SyntaxError: starred assignment target must be in a list or tuple

アスタリスク付きの式はどの場合にもlistインスタンスになります。アンパックされるシーケンスで要素が残っていない場合は、catch-allアンパックの結果はからリストになります。これは、少なくともN要素あるとわかっているシーケンスの処理では特に役に立ちます。

short_list = [1, 2]
first, second, *rest = short_list
print(first, second, rest)
>>>
1 2 []

アスタリスク付きの式を付け加えれば、イテレータをアンパックした値にはより明確になります。

def generate_csv():
    yield ('No.', 'Program language')
    yield (1, 'Java')
    yield (2, 'JavaScript')
    yield (3, 'TypeScript')
    yield (4, 'Python')
    yield (5, 'PHP')

all_csv_rows = list(generate_csv())
header = all_csv_rows[0]
rows = all_csv_rows[1:]
print('CSV Header: ', header)
print('ROW count: ', len(rows))
>>>
CSV Header:  ('No.', 'Program language')
ROW count:  5

アスタリスク付きの式であんパックすると、第１行のヘッダをイテレータの残りの内容と別々に処理することが簡単になります。

it = generate_csv()
header, *rows = it
print('CSV Header: ', header)
print('ROW count: ', len(rows))
>>>
CSV Header:  ('No.', 'Program language')
ROW count:  5

アスタリスク付きの式が常にリストを返すので、イテレータのあんパックにはコンピュータの全メモリを使い尽くしてプログラムがクラッシュする危険があるので、結果のデータがメモリに治るという確信がもてる場合にのみイテレータにcatch-allアンパックを用いるべきです。

ソート

組み込み型listには、sortメソッドがあります。デフォルトでは、sortは、listの内容を要素の自然な順序の昇順に並べます。

numbers = [92, 32, 23, 11, 4, 59]
numbers.sort()
print(numbers)
>>>
[4, 11, 23, 32, 59, 92]

オブジェクトのソートは、クラスで定義されていない比較のための特殊メソッドをsortメソッドが呼び出されそうとするので失敗します。

class Tool:
    def __init__(self, name, weight):
        self.name = name
        self.weight = weight
    
    def __repr__(self):
        return f'Tool({self.name!r}, {self.weight})'
    
tools = [
    Tool('level', 3.5),
    Tool('hammer', 1.25),
    Tool('screwdriver', 0.5),
    Tool('chisel', 0.25),
]

tools.sort()
>>>
TypeError: '<' not supported between instances of 'Tool' and 'Tool'

クラスに整数のような自然な順序がない場合には、必要な特殊メソッドを定義して、追加パラメータがなくてもsortが動作するようにできます。しかし、一般的な場合には、オブジェクトで複数の順序付けをサポートする必要があり、自然な順序の定義だけでは意味がありません。
オブジェクトの属性地によってソートする場合があります。このユースケースをサポートするために、sortメソッドには関数を引数として期待するkeyパラメータが使用できます。key関数には単一引数としてソートされるリストの要素が渡されます。戻り値は、ソートのために使用する比較可能なあたいでなければなりません。

print('Unsorted: ', repr(tools))
tools.sort(key=lambda x: x.name)
print('¥nSorted: ', tools)
>>>
Unsorted:  [Tool('level', 3.5), Tool('hammer', 1.25), Tool('screwdriver', 0.5), Tool('chisel', 0.25)]
Sorted:  [Tool('chisel', 0.25), Tool('hammer', 1.25), Tool('level', 3.5), Tool('screwdriver', 0.5)]

重さでソートする場合も簡単にラムダ関数で定義できます。

tools.sort(key=lambda x : x.weight)
print('By weight: ', tools)
>>>
By weight:  [Tool('chisel', 0.25), Tool('screwdriver', 0.5), Tool('hammer', 1.25), Tool('level', 3.5)]

文字列のsortも簡単に行えます。

place = ['Tokyo', 'work', 'house', 'New York']
place.sort()
print('Case sensitive: ', place)
place.sort(key=lambda x : x.lower())
print('Case insensitive: ', place)
>>>
Case sensitive:  ['New York', 'Tokyo', 'house', 'work']
Case insensitive:  ['house', 'New York', 'Tokyo', 'work']

ソートに複数の基準を使用する時にはtupleを使用します。

tools2 = [
    Tool('drill', 4),
    Tool('circular', 5),
    Tool('jackhummer', 50),
    Tool('sander', 4),
]

tools2.sort(key=lambda x : (x.weight, x.name))
print(tools2)
>>>
[Tool('drill', 4), 
 Tool('sander', 4),
 Tool('circular', 5),
 Tool('jackhummer', 50)]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up