3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Go でタイ文字やアラビア文字の個数をうまく数える

Last updated at Posted at 2015-08-12

前に自分で書いた記事 Python でタイ文字やアラビア文字の個数をうまく数える を Go でやってみたくなった。

Bytes, Unicode(Rune), Grapheme の単位でそれぞれ数えて、 Rune ごとでばらして表示してみた。

Grapheme での数え方は、それっぽい手段が標準ライブラリに用意されてはいなかったので、自分で関数 GraphemeLength を書いた。

package main

import (
	"bufio"
	"fmt"
	"os"
	"unicode"
	"unicode/utf8"
)

func printer(targetString string) {
	bytelength := len(targetString)
	runelength := utf8.RuneCountInString(targetString)
	graphemelength := GraphemeLength(targetString)
	tmpl := "Bytes:\t%d\nRunes:\t%d\nGrapheme:\t%d\n"
	fmt.Printf(tmpl, bytelength, runelength, graphemelength)
	for _, ca := range targetString {
		c := fmt.Sprintf("%#U", ca)
		println(c)
	}
}

func GraphemeLength(targetString string) int {
	length := 0
	inPart := false
	for _, ca := range targetString {
		if unicode.Is(unicode.M, ca) {
			if !inPart {
				inPart = true
			}
			continue
		}
		length++
	}
	return length
}

func main() {
	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		printer(scanner.Text())
	}
}
$ go run main.go <<< 'กินข้าวเย็น'
Bytes:	33
Runes:	11
Grapheme:	8
U+0E01 'ก'
U+0E34 'ิ'
U+0E19 'น'
U+0E02 'ข'
U+0E49 '้'
U+0E32 'า'
U+0E27 'ว'
U+0E40 'เ'
U+0E22 'ย'
U+0E47 '็'
U+0E19 'น'
3
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?