絵文字・顔文字など特殊な文字列を含んだ文章に対して、カスタムCSSに対応した該当文字列の検索方法について紹介します。

特殊文字列の例　

🌟　　🐶　　　🐱　　☆　;ﾟ　か゚

特に顔文字が使われている文章の場合に文字コードが特殊なものが含まれており、文字数カウントにズレが生じてしまうため対応が必要になります。

特殊文字「か゚」を含んだ文章の該当文字列「かき」を検索・装飾する

"あいか゚うえお<span class=“word”>かき</span>くけこ"

今回は３種類の方法でカスタムCSSに対応した「かき」を検索方法を失敗含めメモとして記載します。
・StringのRange
・NSStringのenumerateSubstringsInRange
・NSRegularExpressionで正規表現

StringのRangeを使った方法

StringのstartIndex・endIndexを用いて、カスタムCSSで指定されているtagの先頭と末尾のlocation・rangeを取得し、そこから該当文字列「かき」を検索し装飾（太文字）を加え、その後tagを削除します。

StringのRange


let str = "あいか゚うえお<span class=\"word\">かき</span>くけこ"
let attributedText = NSAttributedString(string: str)

if let range = str.rangeOfString("<span class=\"word\">") {
    headHitLocaion = str.startIndex.distanceTo(range.startIndex)
    backHitLocaion = str.startIndex.distanceTo(range.endIndex)
}
if let range = str.rangeOfString("</span>") {
    headBehindLocaion = str.startIndex.distanceTo(range.startIndex)
    backBehindLocaion = str.startIndex.distanceTo(range.endIndex)
}         
attributedText.addAttributes([NSFontAttributeName: UIFont.boldSystemFontOfSize(UIFont.labelFontSize())], range: NSMakeRange(headHitLocaion + headString.characters.count, headBehindLocaion - backHitPoint))
attributedText.deleteCharactersInRange(NSRange(location: headHitLocaion ,length : headString.characters.count))
attributedText.deleteCharactersInRange(NSRange(location: headHitLocaion + (headBehindLocaion - backHitLocaion),length: backString.characters.count))

特殊文字を含まない文章であれば該当文字列に装飾を追加することができるのですが、特殊文字が含まれると「か゚」を二文字と数えてしまうため、tagの先頭と末尾のlocationにズレが生じてしまい「>か」が太文字になってしまいます。

【結果】
あいか゚うえ>か>くけこ

NSStringのenumerateSubstringsInRange

enumerateSubstringsInRangeのoption・ByWordsを用いて文字列のrangeとlocationを取得します。

enumerateSubstringsInRange

let text = "あいか゚うえお<span class=\"word\">かき</span>くけこ"
let keyword = "かき"
let nsText = text as NSString
let textRange = NSMakeRange(0, nsText.length)
let attributedString = NSMutableAttributedString(string: text)

nsText.enumerateSubstringsInRange(textRange, options: .ByWords, usingBlock: {
            (substring, substringRange, enclosingRange, _) in
     if (substring == keyword) {
         attributedString.addAttributes([NSFontAttributeName: Font.boldSystemFontOfSize(UIFont.labelFontSize())], range: substringRange)
           attributedString.addAttribute(NSForegroundColorAttributeName, value: UIColor.redColor(), range: substringRange)
     }
})
print(attributedString)

ByWordを用いると単語ごとに文字列が区切られるため、カスタムCSS内の文字が単語として認識されなかった場合に検索に引っかから無い場合があります。
ByComposedCharacterSequencesを用いると一文字づつ検索できるが、tag削除が複雑になります。
さらにどちらのoptionを利用してもStringのstartIndex・endIndexを使った方法と同様に「か゚」を二文字と数えてしまうため、locationにズレが生じてしまい、理想的な結果を得ることができません。

NSRegularExpressionで正規表現

文字列抽出用の正規表現オブジェクトを生成し、matchesInStringで検索対象の文字列の中から正規表現にマッチした件数分の結果を取得します。その後マッチした文字列それぞれのtagから該当文字列を太文字にし、前後に配置されているtagを削除します。

正規表現

func searchMachingStr(str: String) -> NSAttributedString{
    let pattern = "<span class=\"word\">((?:.|\n)+?)</span>"
    let headString = "<span class=\"word\">"
    let backString = "</span>" 
    var rangeLength = 0
    let attributedText = NSMutableAttributedString(string: str)
    guard let regex = try? NSRegularExpression(pattern: pattern, options: .CaseInsensitive) else {
        return attributedText
    }    
    let matches = regex.matchesInString(str, options: [], range: NSMakeRange(0, str.characters.count))

    for match in matches {
        attributedText.addAttributes([NSFontAttributeName: UIFont.boldSystemFontOfSize(UIFont.labelFontSize())], range: NSRange(location: match.rangeAtIndex(1).location - rangeLength ,length: match.rangeAtIndex(1).length))
        attributedText.deleteCharactersInRange(NSRange(location: match.rangeAtIndex(1).location + match.rangeAtIndex(1).length - rangeLength, length: backString.characters.count))
        attributedText.deleteCharactersInRange(NSRange(location: match.rangeAtIndex(0).location - rangeLength, length : headString.characters.count))
        rangeLength += match.rangeAtIndex(0).length - match.rangeAtIndex(1).length
    }
    return attributedText
}

正規表現を用いることで特殊文字が文章に含まれていても正確に検索することができます。また文書中に改行が含まれていても簡単に対応することができます。
matchesInStringでマッチした件数分の結果を取得できるため、該当文字列「かき」が複数回文章中に存在しても簡単に対応可能です。

【結果】
あいか゚うえかきくけこ

何回か失敗した結果、正規表現を用いることで文字列の装飾をすることができました。
何かコメントあればよろしくお願いします！

【Swift】顔文字など特殊な文字列を含んだ文章に対して、文字列の検索・太文字等の装飾を行う方法

StringのRangeを使った方法

NSStringのenumerateSubstringsInRange

NSRegularExpressionで正規表現