More than 5 years have passed since last update.

Cryptopals Crypto Challenges Set1 の解答(※ネタバレ注意)

Last updated at 2019-03-19Posted at 2019-03-19

the cryptopals crypto challengesという面白そうな暗号解読の問題を見つけたのでやってみました。それの問題文と解答をアップしておきます、ネタバレ注意です
https://cryptopals.com/
https://github.com/kamata1729/the-cryptopals-crypto-challenges/tree/master

Challenge 1

hex文字列をbase64文字列に変換する

hex文字列

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

は、次のように変換される

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

解答

c1.py

import codecs
INPUT = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"
ANS = "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t"

if __name__ == "__main__":
    output = codecs.encode(codecs.decode(INPUT, 'hex'), 'base64')
    print("out", output.decode())
    print("ANS", ANS)

Challenge 2

Fixed XOR

二つの同じ長さのbufferのXORをとるプログラムを書け
例)
以下の二つのhex形式の文字列が得られた場合、

1c0111001f010100061a024b53535009181c

686974207468652062756c6c277320657965

次の文字列が得られる

746865206b696420646f6e277420706c6179

解答

pythonでXORをとるのはa^bのようにして行うことができます
よって、2文字ずつ区切ってそれらのXORを取っていくことにより、以下のように書くことができます。

c2.py


INPUT = "1c0111001f010100061a024b53535009181c"
AGAINST = "686974207468652062756c6c277320657965"
ANS = "746865206b696420646f6e277420706c6179"

if __name__ == "__main__":
    input_separated = [int(INPUT[2*i: 2*(i+1)], 16)
                       for i in range(len(INPUT)//2)]
    against_separated = [int(AGAINST[2*i: 2*(i+1)], 16)
                         for i in range(len(AGAINST)//2)]
    output = ''.join([format(a ^ b, 'x') for a, b in zip(input_separated, against_separated)])
    print("ANS: ", ANS)
    print("output: ", output)
    print(ANS == output)

Challenge 3

Single-byte XOR cipher

以下のhex文字列は、一つの文字でXORを取ったあとのものである。元の文字列を見つけなさい。

1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

どのようにして？英語の文字を評価するいくつかの手法を試してみてください。
英文字の出現頻度は良い評価基準です。
色々試してみて、もっともふさわしいアウトプットを選んでください

解答

シャーロックホームズの踊る人形みたいな問題です。
https://en.wikipedia.org/wiki/Letter_frequency
から英文字の使用頻度を調べて、その使用頻度分のスコアを与えることにしました。
スペースの使用頻度はだいたいの目安です。

c3.py


import codecs
import numpy as np

INPUT = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"

english_letter_frequency = {
    'e': 12.70,
    't': 9.06,
    'a': 8.17,
    'o': 7.51,
    'i': 6.97,
    'n': 6.75,
    's': 6.33,
    'h': 6.09,
    'r': 5.99,
    'd': 4.25,
    'l': 4.03,
    'c': 2.78,
    'u': 2.76,
    'm': 2.41,
    'w': 2.36,
    'f': 2.23,
    'g': 2.02,
    'y': 1.97,
    'p': 1.93,
    'b': 1.29,
    'v': 0.98,
    'k': 0.77,
    'j': 0.15,
    'x': 0.15,
    'q': 0.10,
    'z': 0.07,
    ' ': 13.00
}


def english_score(input):
    """
    e. g.) input "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
    """
    separated = [int(input[2*i: 2*(i+1)], 16)
                 for i in range(len(input)//2)]
    score = 0
    for s in separated:
        if s > 0x7f:
            score = 0
            break
        char = codecs.decode(format(s, '02x').encode(), 'hex').decode().lower()
        if char in english_letter_frequency.keys():
            score += english_letter_frequency[char]
    return score

def single_xor_cipher(input, show_score=False, show_key=False):
    separated = [int(input[2*i: 2*(i+1)], 16)
                 for i in range(len(input)//2)]

    scores = np.zeros(255)
    for i in range(255):
        out = ''.join([format(x ^ i, '02x') for x in separated])
        scores[i] = english_score(out)
    
    result_dict = {}
    if max(scores) == 0:
        result_dict['output'] = ""
        if show_score:
            result_dict['score'] = 0
        if show_key:
            result_dict['key'] = ""
    else:
        key = scores.argmax()
        output = codecs.decode(
            ''.join([format(x ^ key, '02x') for x in separated]), 'hex').decode()
        result_dict['output'] = output
        if show_score:
            result_dict['score'] = english_score(
                ''.join([format(x ^ key, '02x') for x in separated]))
        if show_key:
            result_dict['key'] = codecs.decode(format(key, '02x'), 'hex').decode()
    return result_dict

if __name__ == "__main__":
    result_dict = single_xor_cipher(INPUT, show_score=True)
    print("output: {}, score: {}".format(result_dict['output'], result_dict['score']))

Challenge 4

Detect single-character XOR

このファイルは一行ごとに60文字の文字列になっている。この中の一列が一つの文字でXORを取って暗号化されている。それを見つけ出しなさい

解答

challenge 3の関数がそのまま使えるので、がっつり使ってます。
もっともスコアが大きくなった文字列が正しい文字列として採用しています。

c4.py

import codecs
from c3 import *

if __name__ == "__main__":
    lines = None
    with open("files/4.txt") as f:
        lines = f.read()
        lines = lines.split('\n')
    
    outputs = []
    scores = []
    for i, string in enumerate(lines):
        string = string.replace('\n', '')
        result_dict = single_xor_cipher(string, show_score=True)
        scores.append(result_dict['score'])
    likely_line_index = np.array(scores).argmax()
    print(single_xor_cipher(lines[likely_line_index]))

Challenge 5

Implement repeating-key XOR

以下は英語の詩の一説です

Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal

これを、'ICE'というキーの素でrepeating-key XORを使って暗号化しなさい。
repeating-key XORとは、一文字目はIで、二文字目はCで、さん文字目はEで、というように、順に暗号化するキーを変えて暗号化することである。

解答

順番に暗号化していけばよい

c5.py

import codecs
import collections

INPUT = "Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal"
KEY = "ICE"
ANS = "0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f"

def repeating_key_xor(input, key):
    encoded = codecs.encode(input.encode("utf-8"), 'hex')
    encoded = [int(encoded[2*i: 2*(i+1)], 16) for i in range(len(encoded)//2)]
    encoded_key = codecs.encode(key.encode("utf-8"), 'hex')
    encoded_key = [int(encoded_key[2*i: 2*(i+1)], 16)
                   for i in range(len(encoded_key)//2)]
    return ''.join([format(x ^ encoded_key[i % len(encoded_key)], '02x') for i, x in enumerate(encoded)])

if __name__ == "__main__":
    out = repeating_key_xor(INPUT, KEY)
    print("output: ", out)
    print("answer: ", ANS)
    print(out == ANS)

Challenge 6

Break repeating-key XOR

ファイルはここにあり、これはrepeating-key XORで暗号化された後にbase64に変換されたものである。これを解読しなさい。

1. KEYSIZEは予測されるkeyの長さである。2から41の値を試してみると良い。
2. ハミング距離を計算するスクリプトを書きなさい。ハミング距離とは、二つのビット列の間で異なるものの数である。例えば、

this is a test

という文字列と、

wokka wokka!!!

という文字列の間のハミング距離は37になる

3. それぞれのKEYSIZEについて、文字列の最初のKEYSIZE文字目までと、KEYSIZE文字目からKEYSIZE*2文字目までのハミング距離を計算する。これをKEYSIZEで割って正規化する。これを全ての文字列で行い、平均をとることもできる。

4. もっとも小さいハミング距離のものがKEYSIZEである。

5. ここまででKEYSIZEが判明するだろう。暗号文をKEYSIZEの長さのブロックに分割する。

6. 各ブロックの一文字目が一つ目のブロックに、２文字目が二つ目のブロックに...となるように並び替える。

7. 各ブロックをsingle-character XORとして解読する。

8. 各ブロックのkeyをつなぎ合わせて、全体のkeyが完成する。

解答

これまでのスクリプトを組み合わせて実装する。

c6.py

import codecs
import base64
import numpy as np
from c1_3 import single_xor_cipher

def hamming_distance(word1, word2):
    word1_int = int(codecs.encode(word1.encode("utf-8"), 'hex').decode(), 16)
    word2_int = int(codecs.encode(word2.encode("utf-8"), 'hex').decode(), 16)
    return format(word1_int ^ word2_int, 'b').count('1')


def get_keysize():
    scores = []
    for keysize in range(2, 41):
        score = 0
        for i in range(0, len(lines)-keysize, keysize):
            score += hamming_distance(lines[i:i+keysize].decode(),
                                      lines[i+keysize:i+keysize*2].decode()) / keysize
        score = score/len(range(0, len(lines)-keysize, keysize))
        scores.append((keysize, score))
    return sorted(scores, key=lambda tuple: tuple[1])[0][0]

def repeating_xor_cipher(lines, keysize):
    key = ""
    output = []
    line_blocks = ["" for _ in range(keysize)]
    for i, char in enumerate(lines):
        line_blocks[i % keysize] += format(lines[i], '02x')

    for i in range(keysize):
        result_dict = single_xor_cipher(
            line_blocks[i], show_score=True, show_key=True)
        key += result_dict['key']
        output.append(result_dict['output'])

    res = ""
    for i in range(len(lines)):
        res += output[i % keysize][i//keysize]
    return key, res
    

if __name__ == '__main__':
    lines = None
    with open("files/6.txt") as f:
        lines = base64.b64decode(f.read())

    key, result = repeating_xor_cipher(lines, get_keysize())
    print("key: ")
    print(key)
    print("result: ")
    print(result)

Challenge 7

AES in ECB mode

このファイルは、AES-128のECBモードで暗号化した後base64でエンコードしたものである。
この時の暗号化キーは、

YELLOW SUBMARINE

である、（大文字小文字はそのまま、全部で16文字）
これを解読しなさい。

解答

AES暗号化方式は若干複雑で自分で実装するのは手間なのでライブラリを使った。

c7.py

import base64
from Crypto.Cipher import AES

if __name__ == "__main__":
    with open("files/7.txt") as f:
        enc = f.read()
    key = "YELLOW SUBMARINE"
    decipher = AES.new(key, AES.MODE_ECB)

    dec = decipher.decrypt(base64.b64decode(enc)).decode('utf-8').rstrip(chr(0x04))
    print(dec)

Challenge 8

Detect AES in ECB mode

このファイルはhexでエンコードされた暗号文たちである。これらのうち一つがECBで暗号化されている。それを見つけなさい。
覚えておくべきことは、ECBの欠陥は決定的であること。同じ16バイトの文字列を暗号化した時は同じ暗号文が得られる。

解答

ヒントとして、同じ暗号文が何度か現れるということが書いてあるため、それを見つけることを考える。
暗号文をblock_sizeごとのchunkに区切って、それのうち重複する数をlen(chunks) - len(set(chunks))で調べている。

c8.py


def discover(enc):
    lines = [bytes.fromhex(line.strip()) for line in enc]
    result = []
    for i in range(len(lines)):
        ciphertext = lines[i]
        for block_size in range(3, 40):
            chunks = [ciphertext[i:i+block_size]
                      for i in range(0, len(ciphertext), block_size)]
            same_num = len(chunks) - len(set(chunks))
            if same_num > 0:
                result.append(
                    {'enc_index': i, 'block_size': block_size, 'same_num': same_num})
    return sorted(result, reverse=True, key=lambda dict: dict['same_num'])[0]

if __name__ == '__main__':
    with open("files/8.txt") as f:
        enc = f.read()
        enc = enc.split('\n')
    
    result = discover(enc)

    print("cipher text: ", enc[result['enc_index']])
    print("block size: ", result['block_size'])

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up