More than 3 years have passed since last update.

Go で文字列を for-range で回したら rune 型が出た件

Last updated at 2022-06-09Posted at 2022-06-05

やっぱりGoでも文字列は配列みたいなノリで扱えるのか〜という所感
— BitterBamboo (@bitter_bamboo) June 4, 2022

このように、文字列を直感的に配列の如く扱っていいだろうと高を括っていたら、
for-range でまさかの rune 型が飛び出してきて、見事にぶちのめされました。

問題のコード

文字列から "ぴえん（🥺）" を全て除去するプログラムです。
今回は標準パッケージの strings.ReplaceAll は使わずに、
文字列中の1文字1文字を for-range で回そうとしてみます。

エラー吐いた

package main

import (
	"fmt"
)

func removeAllPiens(word string) (noPienWord string) {
	for _, char := range word {
		if char == "🥺" { // Error!!
			continue
		}
		noPienWord += char // Error!!
	}
	return
}

func main() {
	result := removeAllPiens("🥺fizz🥺buzz🥺")
	fmt.Println(result)
}

これを実行すると、以下のようなエラーを吐かれます。

./prog.go:9:14: invalid operation: char == "🥺" (mismatched types rune and untyped string)
./prog.go:12:3: invalid operation: noPienWord += char (mismatched types string and rune)

どうやら文字列中の1文字1文字に該当する char が rune 型なせいでエラーとなっているみたいですね。
string 型ちゃうんか工藤。。。

これなら動く

char を string型にキャストします。

package main

import (
	"fmt"
)

func removeAllPiens(word string) (noPienWord string) {
	for _, char := range word {
		if string(char) == "🥺" {
			continue
		}
		noPienWord += string(char)
	}
	return
}

func main() {
	result := removeAllPiens("🥺fizz🥺buzz🥺")
	fmt.Println(result)
}

直感からは少々外れることになりましたが、
なんとか当初の目論見通りにプログラムが動作しました。

文字列を `for-range` で回すと1文字1文字は `rune` 型として現れる

試しに文字列を for-range で回した際の要素の中身を出力してみます。

package main

import (
	"fmt"
)

func main() {
	word := "🥺fizz🥺buzz🥺"
	for index, char := range word {
		fmt.Println(index, string(char), char)
	}
}

結果:

0 🥺 129402
4 f 102
5 i 105
6 z 122
7 z 122
8 🥺 129402
12 b 98
13 u 117
14 z 122
15 z 122
16 🥺 129402

あっ... これ Unicode コードポイントの10進数表記 だ...（察し）

ここで A Tour of Go の Basic Types における説明を思い出します。

rune // int32 の別名
　　// Unicode のコードポイントを表す

基本的に動的型付け脳なので、華麗に素通りしてしまっていましたが、、今なら納得です。。

(訳注：runeとは古代文字を表す言葉(runes)ですが、Goでは文字そのものを表すためにruneという言葉を使います。)

また、文字列中の index は 1 byte ごとに振られているようです。

🥺 は UTF-8(可変長) で 4 byte の文字なので、
🥺 に後続する文字の index は 4つ跳んでます。

`for-range` の言語仕様

気になったのでググってみました。
string 型に対して for-range を適用すると、
繰り返しの各要素が rune 型となることがはっきりと記述されています。

For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.

次回

rune 型の基本的な挙動をおさらいしていきます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up