3
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

ハフマン圧縮を学ぼう

Last updated at Posted at 2024-12-01

この記事では、データ圧縮の基本アルゴリズムである**ハフマン符号(Huffman Coding)**について、Pythonを使った実装を通して学んでいきます。

ハフマン符号とは?

ハフマン符号は、データ圧縮においてよく使われる手法で、次の特徴を持っています:

  • 文字の出現頻度に基づいて符号を生成します
  • 頻度が高い文字には短い符号、頻度が低い文字には長い符号を割り当てます
  • 可変長の符号を使用するため、全体のデータ量を削減できます

ハフマン符号を理解するための手順

  1. 文字の出現頻度を計算する
  2. ハフマンツリーを構築する
  3. 各文字に符号を割り当てる
  4. 入力文字列をハフマン符号で圧縮する
  5. 圧縮データを復元する

以下で、それぞれの手順をPythonコードとともに解説します!
ここでは、簡単に「This is a stone.」を圧縮します。

1. 文字の頻度を計算する

テキストデータから各文字の出現頻度を計算します。

from collections import defaultdict

def calculate_frequency(text):
    frequency = defaultdict(int)
    for char in text:
        frequency[char] += 1
    return frequency

結果

's': ###
' ': ###
'i': ##
'T': #
'h': #
'a': #
't': #
'o': #
'n': #
'e': #
'.': #

2. ハフマンツリーを構築する

class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(frequency):
    heap = [HuffmanNode(char, freq) for char, freq in frequency.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        node1 = heapq.heappop(heap)
        node2 = heapq.heappop(heap)
        merged = HuffmanNode(None, node1.freq + node2.freq)
        merged.left = node1
        merged.right = node2
        heapq.heappush(heap, merged)

    return heap[0]

# ハフマンツリーを構築
huffman_tree = build_huffman_tree(frequency)

結果
出頻度の多い「s」「 」が上の階層にあることが分かります。

Huffman Tree:
Freq: 16
├── Freq: 7
   ├── s: 3
   └── Freq: 4
       ├── Freq: 2
          ├── n: 1
          └── o: 1
       └── Freq: 2
           ├── T: 1
           └── h: 1
└── Freq: 9
    ├── Freq: 4
       ├── i: 2
       └── Freq: 2
           ├── t: 1
           └── a: 1
    └── Freq: 5
        ├── Freq: 2
           ├── .: 1
           └── e: 1
        └──  : 3

3. 各文字に符号を割り当てる

def generate_huffman_codes(node, prefix="", code_map=None):
    if code_map is None:
        code_map = {}
    if node.char is not None:
        code_map[node.char] = prefix
    else:
        generate_huffman_codes(node.left, prefix + "0", code_map)
        generate_huffman_codes(node.right, prefix + "1", code_map)
    return code_map
    
# ハフマンコードを生成
huffman_codes = generate_huffman_codes(huffman_tree)

結果
出頻度の多い「s」「 」に少ないバイナリデータが割り当てられていることが分かります。

's': 00
' ': 111
'i': 100
'n': 0100
'o': 0101
'T': 0110
'h': 0111
't': 1010
'a': 1011
'.': 1100
'e': 1101

4. 入力文字列をハフマン符号で圧縮する

def huffman_encode(text, code_map):
    return ''.join(code_map[char] for char in text)
    
encoded_data = huffman_encode(text, huffman_codes)

5. 圧縮データを復元する

def huffman_decode(encoded_data, root):
    decoded_text = []
    current_node = root
    for bit in encoded_data:
        if bit == "0":
            current_node = current_node.left
        else:
            current_node = current_node.right

        if current_node.char is not None:  # リーフノードに到達
            decoded_text.append(current_node.char)
            current_node = root

    return ''.join(decoded_text)
    
# 復元を実行
decoded_text = huffman_decode(encoded_data, huffman_tree)

全体のコード例

import heapq
from collections import defaultdict

# 1. Calculate frequency
def calculate_frequency(text):
    frequency = defaultdict(int)
    for char in text:
        frequency[char] += 1
    return frequency

# 2. Build Huffman Tree
class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(frequency):
    heap = [HuffmanNode(char, freq) for char, freq in frequency.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        node1 = heapq.heappop(heap)
        node2 = heapq.heappop(heap)
        merged = HuffmanNode(None, node1.freq + node2.freq)
        merged.left = node1
        merged.right = node2
        heapq.heappush(heap, merged)

    return heap[0]

# 3. Generate Huffman Codes
def generate_huffman_codes(node, prefix="", code_map=None):
    if code_map is None:
        code_map = {}
    if node.char is not None:
        code_map[node.char] = prefix
    else:
        generate_huffman_codes(node.left, prefix + "0", code_map)
        generate_huffman_codes(node.right, prefix + "1", code_map)
    return code_map

# 4. Encode the text
def huffman_encode(text, code_map):
    return ''.join(code_map[char] for char in text)

# 5. Decode the encoded data
def huffman_decode(encoded_data, root):
    decoded_text = []
    current_node = root
    for bit in encoded_data:
        if bit == "0":
            current_node = current_node.left
        else:
            current_node = current_node.right

        if current_node.char is not None:  # Reached a leaf node
            decoded_text.append(current_node.char)
            current_node = root

    return ''.join(decoded_text)

# Main Execution
text = "This is a stone."

# Step 1: Calculate frequency
frequency = calculate_frequency(text)

# Step 2: Build Huffman Tree
huffman_tree = build_huffman_tree(frequency)

# Step 3: Generate Huffman Codes
huffman_codes = generate_huffman_codes(huffman_tree)

# Step 4: Encode the text
encoded_data = huffman_encode(text, huffman_codes)

# Step 5: Decode the encoded data
decoded_text = huffman_decode(encoded_data, huffman_tree)

# Display results
print("Character frequencies:")
for char, freq in sorted(frequency.items(), key=lambda x: x[1], reverse=True):
    print(f"'{char}': {freq}")

print("\nHuffman Codes:")
for char, code in sorted(huffman_codes.items(), key=lambda x: frequency[x[0]], reverse=True):
    print(f"'{char}': {code}")

print("\nEncoded Data:")
print(encoded_data)

print("\nDecoded text matches original:", decoded_text == text)
3
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?