More than 5 years have passed since last update.

Pythonでbcryptを使ってパスワードをゆっくりハッシュ化

Last updated at 2016-09-22Posted at 2016-09-22

ここで使うコードはgithubに置いてます。
https://github.com/matsulib/bcrypt-dictionary-attack

導入

インストール方法
bcryptの特徴
bcryptの基本的な使い方
bcrypt.gensalt()の引数について

実験1. work-factorの影響

実行環境
コードと結果

実験2. 辞書攻撃っぽいの（SHA256との比較）

実行環境
SHA256（＋ソルト）のコード
bcryptのコード
辞書攻撃のコード
結果

まとめ

導入

パスワード保管に適したハッシュ関数bcryptをPythonから利用する場合は、bcryptモジュール (https://pypi.python.org/pypi/bcrypt/3.1.1)を使います。

インストール方法:

pip install bcrypt

bcryptの特徴

bcryptはBlowfish暗号ベースのハッシュ関数です。

MD5やSHAファミリなどのハッシュ関数とbcryptとの決定的な違いは、前者がfast hashであるのに対して、後者はslow hashであることです。

fast hashは高速であるため、大きなファイルから固定長のダイジェストを得るのには便利ですが、パスワード管理で使ってしまうとハッシュ値が漏れてしまった場合にオフライン攻撃によってクラックされやすいという問題があります。そのため、意図的に処理を遅くさせるストレッチ（ハッシュ化を繰り返す）という手法があります。

一方、bcryptはもともと演算を繰り返すように設計されたslow hashであり、単に遅いだけでなく高速なハードウェア実装を難しくする設計にもなっています。

This system hashes passwords using a version of Bruce Schneier's Blowfish block cipher with modifications designed to raise the cost of off-line password cracking and frustrate fast hardware implementation.

http://www.mindrot.org/projects/py-bcrypt/

Basically, computational power can be parallelized cheaply and easily, but memory cannot. This is the cornerstone of bcrypt and scrypt. Obviously, they can still be broken by sheer brute force, and you could just use hardware with integrated memory units to circumvent the problem, but it's much harder and much, much more expensive to do so.

http://crypto.stackexchange.com/questions/3491/is-stretching-hash-several-times-basically-the-same-as-bcrypt

正直、理解不足で上記の説明が正しいのか、今でも有効なのかよく分かってません。

2015年7月に発生したAshley Madisonの登録者情報流出事件でもbcryptは使われていたそうです。ただ、致命的だったのはMD5(しかも小文字のアルファベット26文字であることもバレていた)だったようで、攻撃者にとってbcrypt自体はまだ骨の折れる存在であるように思えます。

Instead of cracking the slow bcrypt hashes directly, which is the hot topic at the moment, we took a more efficient approach and simply attacked the md5(lc($username).”::”.lc($pass)) and md5(lc($username).”::”.lc($pass).”:”.lc($email).”:73@^bhhs&#@&^@8@*$”) tokens instead. Having cracked the token, we simply then had to case correct it against its bcrypt counterpart.

http://cynosureprime.blogspot.jp/2015/09/how-we-cracked-millions-of-ashley.html

bcryptの基本的な使い方

bcrypt.gensalt(rounds=12, prefix=b'2b') # ソルトを生成
bcrypt.hashpw(password, salt) # パスワードをハッシュ化
bcrypt.checkpw(password, hashed_password) # パスワードを検証

以下のようにパスワードをハッシュ化します。

>>> import bcrypt
>>> salt = bcrypt.gensalt(rounds=10, prefix=b'2a')
>>> salt
b'$2a$10$VpsqBArIfdhGzJY1YO/xyO'
>>> password = b'password'
>>> bcrypt.hashpw(password, salt)
b'$2a$10$VpsqBArIfdhGzJY1YO/xyOiOsCrQc9BZAaonPt3QDsL0HWzoaHXgG'

これは次のような構造になっています。

$(２文字のバージョン)$(２文字のwork-factor)$(22文字のソルト)(31文字のハッシュ)

bcrypt.gensalt()の引数について

bcrypt.gensalt()は呼び出しごとに異なるソルトを生成します。また、呼び出し時に２つの引数を指定できます。

rounds：ハッシュ値の計算を遅くするためのwork-factorと呼ばれるパラメータで、4から31の値が指定できます（default=12）。2^rounds回演算が繰り返されます。
prefix：b'2a'かb'2b'を指定できます（default=b'2b'）。

現在は"2a"が主流になっているようです。

https://yuskamiya.tumblr.com/post/100503173956

実験1. work-factorの影響

以下では、bcrypt.gensalt()のwork-factorの値を変えたときの計算時間を見てみます。

実行環境:

Python 3
Windows 10 Pro 64bit Intel Core i5-4210U @ 1.70GHz, 8.0GB RAM
IPython or Jupyter

コードと結果

In [1]: import bcrypt
In [2]: password = b'password'
In [3]: for i in range(5, 18):
   ...:     print('rounds: {:02d}'.format(i), end=' => ')
   ...:     %timeit -r1 bcrypt.hashpw(password, bcrypt.gensalt(rounds=i))
   ...:
rounds: 05 => 100 loops, best of 1: 2.73 ms per loop
rounds: 06 => 100 loops, best of 1: 5.4 ms per loop
rounds: 07 => 100 loops, best of 1: 10.7 ms per loop
rounds: 08 => 10 loops, best of 1: 21 ms per loop
rounds: 09 => 10 loops, best of 1: 42.1 ms per loop
rounds: 10 => 10 loops, best of 1: 84.4 ms per loop
rounds: 11 => 10 loops, best of 1: 172 ms per loop
rounds: 12 => 1 loop, best of 1: 369 ms per loop
rounds: 13 => 1 loop, best of 1: 703 ms per loop
rounds: 14 => 1 loop, best of 1: 1.4 s per loop
rounds: 15 => 1 loop, best of 1: 2.71 s per loop
rounds: 16 => 1 loop, best of 1: 5.42 s per loop
rounds: 17 => 1 loop, best of 1: 10.8 s per loop

当然ですが、work-factorを1増やすごとに計算時間が2倍になりました。

実験2. 辞書攻撃っぽいの（SHA256との比較）

実行環境

Python 3
Windows 10 Pro 64bit Intel Core i5-4210U @ 1.70GHz, 8.0GB RAM

SHA256（＋ソルト）のコード

hashing_sha256.py

import uuid
import hashlib

# Reference: http://pythoncentral.io/hashing-strings-with-python/

def hash_password(password):
    # uuid is used to generate a random number
    salt = uuid.uuid4().hex
    return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ':' + salt
    
def check_password(hashed_password, user_password):
    password, salt = hashed_password.split(':')
    return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest()

if __name__ == '__main__':
    new_pass = input('Please enter a password: ')
    hashed_password = hash_password(new_pass)
    print('The string to store in the db is: ' + hashed_password)
    old_pass = input('Now please enter the password again to check: ')
    if check_password(hashed_password, old_pass):
        print('You entered the right password')
    else:
        print('I am sorry but the password does not match')

実行結果

> python hashing_sha256.py
Please enter a password: piyo
The string to store in the db is: 25bcf883839511cdb493b3ba25bc0ad3fc809a3688076d198ef13128f5883a66:f0ac11a13a314336951392ff52c9c2d6
Now please enter the password again to check: piyo
You entered the right password

bcryptのコード
実行結果はSHA256のコードと同じになるようにしています。

hashing_bcrypt.py

import bcrypt

def hash_password(password, rounds=12):
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds)).decode()

def check_password(hashed_password, user_password):
    return bcrypt.checkpw(user_password.encode(), hashed_password.encode())

辞書攻撃のコード

設定

ユーザー数：10人
一人だけ簡単なパスワードを設定
利用する辞書：簡単なパスワード25個 https://www.teamsid.com/worst-passwords-2015/

dictionary_attack.py

import time

import hashing_sha256 as sha256
import hashing_bcrypt as bcrypt

def attack(leaked_hashed_password, hashing):
    # https://www.teamsid.com/worst-passwords-2015/
    dictionary = ['123456', 'password', '12345678', 'qwerty', '12345', 
                '123456789', 'football', '1234', '1234567', 'baseball', 
                'welcome', '1234567890', 'abc123', '111111', '1qaz2wsx', 
                'dragon', 'master', 'monkey', 'letmein', 'login', 'princess', 
                'qwertyuiop', 'solo', 'passw0rd', 'starwars']
            
    for p in dictionary:
        if hashing.check_password(leaked_hashed_password, p):
            return '当たり！パスワードは {} です。'.format(p)
    else:
        return 'はずれ(´・ω・｀)'


passwords = ['complex.password'] * 9 + ['passw0rd']

print('sha256')
leaked_sha256 = [sha256.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_sha256):
    result = attack(v, sha256)
    print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

print('bcrypt-5')
leaked_bcrypt = [bcrypt.hash_password(p, 5) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

print('bcrypt-12')
leaked_bcrypt = [bcrypt.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

結果

SHA256

sha256
user00: はずれ(´・ω・｀)
user01: はずれ(´・ω・｀)
user02: はずれ(´・ω・｀)
user03: はずれ(´・ω・｀)
user04: はずれ(´・ω・｀)
user05: はずれ(´・ω・｀)
user06: はずれ(´・ω・｀)
user07: はずれ(´・ω・｀)
user08: はずれ(´・ω・｀)
user09: 当たり！パスワードは passw0rd です。
Total time: 0.005 s

bcrytp(work-factor=5)

bcrypt-5
Total time: 0.715 s

bcrypt(work-factor=12)

bcrypt-12
Total time: 90.040 s

bcryptへの辞書攻撃は（もちろんブルートフォース攻撃も）、SHA256に比べてとても時間がかかることがわかります。実際の辞書やユーザー数はもっと多いので、攻撃者への嫌がらせに効果を発揮するはずです。

まとめ

1. Pythonからbcryptを利用する場合は、bcryptモジュールを使う

2. 適切なwork-factorを選ぶ

work-factorに大きな値を設定することで、攻撃者のオフライン攻撃に対する時間稼ぎが可能ですが、これは通常運用時のパフォーマンスにも影響を与えます。セキュリティと利便性のトレードオフです。

望ましい時間（10、200msなど）内にサーバがパスワードをチェックする反復の回数を決定し、それを使用する。

http://blog.f-secure.jp/archives/50564743.html

I don’t believe that there is a “correct” work factor; it depends on how strong you want your hashes to be and how much computational power you want to reserve for the hashing process.

https://wildlyinaccurate.com/bcrypt-choosing-a-work-factor/

おわり。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up