0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

制御文字やサロゲートペア文字を置換する

0
Posted at

かなり限定された子ネタですが、メモ。
文字列 String 中の以下の 文字 (char) をまとめて一括で消します。

コード

java8 標準の Character と、Google Guava ライブラリの CharMatcher を使います。

import org.junit.Test
import kotlin.test.assertEquals
import com.google.common.base.CharMatcher

// 制御文字 + サロゲートペア文字 にまとめてヒットする CharMatcher を定義します.
val BAD_CHARS: CharMatcher = CharMatcher.javaIsoControl()  // 制御文字
    .or(CharMatcher.forPredicate { Character.isSurrogate(it!!.toChar()) })  // サロゲート文字

class CharMatcherTest {
    // 0x0a = LF(改行)
    @Test
    fun test_isoControl_u000a() {
        assertEquals(
            "こんにちは 世界",
            BAD_CHARS.replaceFrom("こんにちは\n世界", " ")
        )
    }

    // 0x0d = CR(復帰) &  0x0a = LF(改行)
    @Test
    fun test_isoControl_u000d_u000a() {
        assertEquals(
            "こんにちは  世界",
            BAD_CHARS.replaceFrom("こんにちは\r\n世界", " ")
        )
    }

    // サロゲートペア文字
    @Test
    fun test_surrogatePair() {
        assertEquals(
            "  -  -  ",
            BAD_CHARS.replaceFrom("𠀋-𡈽-𡌛", " ")
        )
    }
}

kotlin ですいませんが java8 に読み替えて下さいい〜

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?