結論

#include "../include/string_split.hpp"
#include <iostream>
int main()
{
    std::string s = "arikitari na world!";
    const auto s_1 = s | split(' ')[1];//"na"
    std::string s2 = "123,421,113";
    const auto s_2 = s2 | split(',') >> [](const std::string& s) {
        return std::stoi(s);
    };//[123,421,113]
    s2 | split(',') >> [](std::string&& s) {
        std::cout << s << std::endl;
    };
    /*stdout:
    123
    421
    113
    */
    return 0;
}

これ使ってみてください！single header only libraryです！
PStade.Ovenというかboost.range風味のpipeline記法が使える上に遅延評価のおかげで高速です！

初めに

Pythonとかにはstringの分割があるのに、C++では見つからなかったので、作ってみた！！

・・・という記事は割とよく見かける

ex.)

しかし、どれも見た目がダサく、かつ、速度を追求できていない。C++なんだから、速度を追求するべきだ。

みんな、splitした後どうしてる？

みんな、文字列をsplitした後、どうします？

何が言いたいかといいますと、どれもこれもstd::vector<std::basic_string<CharType>>に格納して結果を返しているけれど、本当にいつもそれが最適か？ということです。

C++において、プログラムを高速化する際、アルゴリズムの次に大事なのはメモリー確保数の削減です。これを意識してないプログラムは、意識したプログラムに対して当社比5倍遅いです。

例1

例えばこんなコード。

using namespace std::literals;
const auto s = "123,456,789"s;
const std::vector<std::string> splitted = split(s, ',');
std::vector<int> re;
re.reserve(splitted.size());
for(const auto& s_n : splitted) re.push_back(std::stoi(s_n));

splittedという無駄なvectorが生成されています。たかが一回じゃないかだって？それ、減らせるでしょ！

例2

例えばこんなコード。

using namespace std::literals;
const auto s = "123,456,789"s;
const auto n = std::stoi(split(s, ',')[1]);//456

この場合vector一つとstring2つの計3回は最低でも無駄なallocateが起こります。std::vectorのメモリー確保戦略が倍々ゲームじゃない場合、push_backするたびにメモリー再確保とstringのmoveが発生するので更に重いです(Visual Studioとかな！)。

じゃあどうするか？operator overloadで遅延評価でしょ！

さてみなさん、突然ですが、クイズです。

添字演算子(operator [])、右シフト演算子(operator >>)、ビットOR演算子(operator |)は優先順位が高い順に並べるとどうなるでしょうか？

まあC言語の初心者教本にも載っていますが、

operator[] > operator>> > operator|

ですね。これを有効活用します。

operator|はpipelineというかchain-method-likeというかに持って来い

Boost.Rangeをパイプライン記法のままコンテナに変換
でも紹介がありますが、簡単に触れておきます。

C++は強力なoperator overloadのある言語として知られていますが、残念ながら、operator.はオーバーロードできません。

いや、まあC++標準化委員会に提案はされているんですが
[PDF] P0252R1: Operator Dot Wording(解説:C++標準化委員会の文書: P0250R0-P0259R0 | 本の虫)
どうもrejectされたっぽいので・・・。

そこでoperator|の出番です。これをうまく活用すると

Scala

val vec = (1 to 10).
  filter(i => i % 2 != 0).
  map(i => i * i).
  toVector

print(vec.mkString(",")) // 1,9,25,49,81

が

C++

auto range = boost::irange(1, 10)
  | boost::adaptors::filtered([](int i){ return i % 2 != 0; })
  | boost::adaptors::transformed([](int i){ return i * i; })

for (int i : range) std::cout << i << ","; //  1,9,25,49,81,

のように書けます。かっこいいですね、丸パクしましょう。

実装していく

目指すもの

#include "../include/string_split.hpp"
#include <iostream>
int main()
{
    std::string s = "arikitari na world!";
    const auto s_1 = s | split(' ')[1];//"na"
    std::string s2 = "123,421,113";
    const auto s_2 = s2 | split(',') >> [](const std::string& s) {
        return std::stoi(s);
    };//[123,421,113]
    s2 | split(',') >> [](std::string&& s) {
        std::cout << s << std::endl;
    };
    /*stdout:
    123
    421
    113
    */
    return 0;
}

こうかけるようにしてみましょう。さっきのoperatorの優先順位の話を思い出せば自明ですが、operator|が実際のsplit処理を担います。

実装方法はboost.rangeとそんなに変わんないので

Boost.Rangeってなに

この辺も参考に。

仕様を固めよう

char/wchar_t/char16_t/char32_t型をまとめてCharTypeをします。

区切り文字(列)(delimiter)に使えるのは

CharType
const CharType*
std::basic_string<CharType>

とする。

operator[]に渡す数字の型はstd::size_tとし、この場合operator|の戻り値の型はstd::basic_string<CharType>>とする。

operator>>に渡せる変換関数は、任意の型(void型含む)をSomeTypeとすると

SomeType (*)(/*std::basic_string<CharType>(lvalue/rvalue両方の可能性あり)を受け取れる型*/)
lambda式で上記条件に合致するもの
operator()を持つclass(関数オブジェクト)で上記条件に合致するもの

とし、SomeTypeがvoidの場合はstd::vectorに格納しない、それ以外の場合はoperator|の戻り値の型はstd::vector<SomeType>とする。

上記に当てはまらないかぎり、operator|の戻り値の型はstd::vector<std::basic_string<CharType>>>とする。

splitに必要な情報を格納するclassを作る

長いので

https://github.com/yumetodo/string_split/blob/58518f0d4d07ab799e84077a953419e2f28af0b0/include/string_split.hpp#L67-L132
67行目から132行目を見てください。

delimiterが3パターンある関係で3種類の特殊化を行なっています。また、operator[]を呼んでいる時、operator>>を呼んでいる時、いずれでもない時の3つがあるので3種類のclass(順にsplit_helper_index,split_helper_conv_func, split_helper)があります。

splitに必要な情報を格納するclassを構築するfactory関数を作る

いわゆるmake系関数ですね。

template<typename CharType, typename std::enable_if<detail::type_traits::is_char_type<CharType>::value, std::nullptr_t>::type = nullptr>
constexpr detail::split_helper<CharType, true, false, false> split(CharType delim) noexcept { return{ delim }; }
template<typename CStr, typename std::enable_if<detail::type_traits::is_c_str<CStr>::value, std::nullptr_t>::type = nullptr>
constexpr detail::split_helper<CStr, false, true, false> split(CStr delim) noexcept { return{ delim }; }
template<typename StlString, typename std::enable_if<detail::type_traits::is_stl_string<StlString>::value, std::nullptr_t>::type = nullptr>
detail::split_helper<StlString, false, false, true> split(StlString delim) noexcept { return{ std::move(delim) }; }

std::enable_ifを使ってオーバーロードする時、enablerを使う？でお馴染みSFINAEでdelimiter3パターンを分けて先ほどのclassを構築します。

operator>>を作る

C++のつまずきポイント解説#テンプレート型推論その２（パラメータタイプがユニヴァーサル参照）
に解説のある通り、ユニヴァーサル参照の性質を利用して完全転送すればいいだけです。

抜粋

template<typename CharType>
    struct split_helper<CharType, true, false, false> {
    //略
    template<typename FuncType>
    constexpr split_helper_conv_func<CharType, FuncType, true, false, false>
    operator >> (FuncType&& f) const { 
        return{ delim, std::forward<FuncType>(f) }; 
    }
};

operator[]を作る

何も悩むことはないですね

template<typename CharType>
struct split_helper<CharType, true, false, false> {
    //略
    constexpr split_helper_index<CharType, true, false, false>
    operator[](size_t n) const noexcept { return{ delim, n }; }
};

operator|を作る

いよいよ本番です。

delimiterは3パターンありますが、std::basic_string<CharType>::find_xxx系関数に指定するだけなので、

CharTypeの時
const CharType*またはstd::basic_string<CharType>のとき

の2パターンです。

operator|の左辺の型によっても場合分けします。これは、std::basic_string<CharType>::sub_strよりもstd::basic_string<CharType>::eraseのほうが高速な可能性が高いからです(新規メモリー確保をしない)

lvalue
rvalue

splitのモードによっても場合分けが必要です。

operator[]でindexを指定している場合
operator>>で変換関数を指定していて、その戻り値がvoidの場合
operator>>で変換関数を指定していて、その戻り値がvoidではない場合
operator[],operator>>共に呼んでいない場合

つまり、2*2*4=16通りのoperator|を書くわけですね。

内部的にはstd::string::find_first_ofとstd::string::find_first_not_ofをforで回しているだけです。

もうすこしざっと説明してくれ

まあ高速化のための場合分けをすっ飛ばすと、

operator[]でindexを指定している場合
operator>>で変換関数を指定していて、その戻り値がvoidの場合
operator>>で変換関数を指定していて、その戻り値がvoidではない場合
operator[],operator>>共に呼んでいない場合

の4パターンがあります。いずれも実際のsplit処理はoperator|が担います。

で、operator|の右辺(第二引数)にsplitのための情報を渡さなければなりません。その情報とは

4パターンのどれなのか
=> クラス名
区切り文字(列)とその型
=> メンバー変数delimとtenplate引数DelimType
ほしいのは何番目か(operator[]を呼んでいる時のみ)
=> メンバー変数index(split_helper_indexクラスのみ)
変換関数(operator>>を呼んでいる時のみ)とその型
=> メンバー変数fとtemplate引数FuncType,メンバー型定義result_type(split_helper_conv_funcクラスのみ)

です。そのために3種類の情報格納クラスを作ったわけです(変換関数の戻り値の型は一つのクラスでやりくりしてます)。

あとは適当にstd::basic_string<CharType>::find_first_ofとかそのへんで分割してそのまま返すなり、vectorにぶち込むなり、変換関数に渡すなり、変換関数に渡してからvectorにぶち込んで終わりです。簡単でしょ？(どこが)

ベンチマーク

で、上の方で速度とかなんかほざいていましたが、作ってる時はそこまで意識していたわけではありません。というわけで計測してみましょう。上でリンクはって紹介したところからベンチマークのためにプログラムをパクっています。

環境
Windwos7 x64 Intel Core i5-6600 RAM:8GB DDR4 Visual Studio 2015 Update 3

結果_vs2015_x64_Debug

split_sstream: 19633[ms] (19633506811[ns]) : string.length:1000 split count:21 roop count:20000
split_foreach: 20273[ms] (20273212873[ns]) : string.length:1000 split count:21 roop count:20000
split_find: 8219[ms] (8219093174[ns]) : string.length:1000 split count:21 roop count:20000
split_find_first_of: 6169[ms] (6169374878[ns]) : string.length:1000 split count:21 roop count:20000
split_our_library: 3831[ms] (3831719290[ns]) : string.length:1000 split count:21 roop count:20000

split_sstream: 29398[ms] (29398418203[ns]) : string.length:20 extract result:73roop count:400000
split_foreach: 28892[ms] (28892307531[ns]) : string.length:20 extract result:73roop count:400000
split_find: 37655[ms] (37655197851[ns]) : string.length:20 extract result:73 roop count:400000
split_find_first_of: 28587[ms] (28587630313[ns]) : string.length:20 extract result:73 roop count:400000
split_our_library: 1568[ms] (1568165636[ns]) : string.length:20 extract result:73 roop count:400000

結果_vs2015_x64_Release

split_sstream: 221[ms] (221553891[ns]) : string.length:1000 split count:21 roop count:20000
split_foreach: 105[ms] (105126015[ns]) : string.length:1000 split count:21 roop count:20000
split_find: 108[ms] (108013702[ns]) : string.length:1000 split count:21 roop count:20000
split_find_first_of: 60[ms] (60907927[ns]) : string.length:1000 split count:21 roop count:20000
split_our_library: 53[ms] (53194032[ns]) : string.length:1000 split count:21 roop count:20000

split_sstream: 365[ms] (365028322[ns]) : string.length:20 extract result:73 roop count:400000
split_foreach: 213[ms] (213769813[ns]) : string.length:20 extract result:73 roop count:400000
split_find: 234[ms] (234030923[ns]) : string.length:20 extract result:73 roop count:400000
split_find_first_of: 164[ms] (164239805[ns]) : string.length:20 extract result:73 roop count:400000
split_our_library: 18[ms] (18209121[ns]) : string.length:20 extract result:73 roop count:400000

結果_vs2015_clang_with_microsoft_codegem-O0

split_sstream: 3321[ms] (3321390863[ns]) : string.length:1000 split count:20 roop count:20000
split_foreach: 2373[ms] (2373877504[ns]) : string.length:1000 split count:20 roop count:20000
split_find: 923[ms] (923088600[ns]) : string.length:1000 split count:21 roop count:20000
split_find_first_of: 642[ms] (642154103[ns]) : string.length:1000 split count:21 roop count:20000
split_our_library: 420[ms] (420755417[ns]) : string.length:1000 split count:21 roop count:20000

split_sstream: 3665[ms] (3665194379[ns]) : string.length:20 extract result:73 roop count:400000
split_foreach: 3054[ms] (3054931591[ns]) : string.length:20 extract result:73 roop count:400000
split_find: 3991[ms] (3991762364[ns]) : string.length:20 extract result:73 roop count:400000
split_find_first_of: 2951[ms] (2951393803[ns]) : string.length:20 extract result:73 roop count:400000
split_our_library: 211[ms] (211476359[ns]) : string.length:20 extract result:73 roop count:400000

結果_vs2015_clang_with_microsoft_codegem-O3

split_sstream: 214[ms] (214364355[ns]) : string.length:1000 split count:21 roop count:20000
split_foreach: 100[ms] (100348038[ns]) : string.length:1000 split count:21 roop count:20000
split_find: 105[ms] (105813618[ns]) : string.length:1000 split count:21 roop count:20000
split_find_first_of: 53[ms] (53963566[ns]) : string.length:1000 split count:21 roop count:20000
split_our_library: 46[ms] (46783306[ns]) : string.length:1000 split count:21 roop count:20000

split_sstream: 438[ms] (438771305[ns]) : string.length:20 extract result:73 roop count:400000
split_foreach: 250[ms] (250181562[ns]) : string.length:20 extract result:73 roop count:400000
split_find: 375[ms] (375356284[ns]) : string.length:20 extract result:73 roop count:400000
split_find_first_of: 266[ms] (266208841[ns]) : string.length:20 extract result:73 roop count:400000
split_our_library: 21[ms] (21786575[ns]) : string.length:20 extract result:73 roop count:400000

遅延評価と、memory allocation削減はやはり効果が高いようです。operator[]を呼ぶケースでは勝負になっていませんし、同じ土俵のはずの純粋なsplitもmemory allocation削減のおかげで高速になっていますね。

これでもう

@yumetodo そう言えば、あなたのsplit library、実際の速度が測ったことないでしょう…
— 心愛@ときめきポポ論 (@MaverickTse) 2016年7月20日

とか言われずにすみますね！

ついでなのでCIでもベンチマークするようにしました。

https://app.shippable.com/projects/577cc6213be4f4faa56be97c/status/
Build番号41からです。

追記

どうもCIでベンチは途中で~~タイムアウト~~謎の504 Server Error: Gateway Time-outになっている・・・？失敗するなぁ・・・。

~~LinuxのGCCだと下手にreserveでむりやり倍々ゲームのメモリー確保戦略にしないほうが速くなっていますね。~~(追記：デュアルブート環境作ってUbuntu入れて試したらとんとんだったぞい！)手元のmingw-gccでは僅差で勝っていたんですが。いずれにせよVSのメモリー確保戦略はガバのプーさんと改めて証明された。

反省&感想

operator overloadを悪用しすぎた
テストフレームワークにiutestを使ったんだけど、ものすごい便利、wandboxでテストできるなんて！
速度を追求するのはいいけど、思ったより長くなって、テストケース書くのが大変だった
travis-ciのymlわかんねぇな！
なんでtravis-ciのlcov失敗するん？
Makefileワガンネ
さいきん、http://llvm.org が落ち過ぎな気がする
operator overloadしまくるとdoxygenコメントが書きにくい(今回は書くの諦めた)

追記:msys2 mingw32 clang3.8.0でだけコンパイラがバグる

詳しいことは
Clangのバグと戦うために173210氏に助けを求めた話 - Togetterまとめ
を見てほしいが、コンパイラがバグる。

#0 0x05e6779d
#1 0x0028dfd0
#2 0x00e054a1 (D:\msys64\mingw32\bin\clang++.exe+0xa054a1)
#3 0x779fe40c RtlInitUnicodeString (C:\Windows\SysWOW64\ntdll.dll+0x2e40c)
#4 0x779fe172 RtlAllocateHeap (C:\Windows\SysWOW64\ntdll.dll+0x2e172)
#5 0x779fe40c RtlInitUnicodeString (C:\Windows\SysWOW64\ntdll.dll+0x2e40c)
#6 0x779fe172 RtlAllocateHeap (C:\Windows\SysWOW64\ntdll.dll+0x2e172)
#7 0x761d9d45 malloc (C:\Windows\syswow64\msvcrt.dll+0x9d45)
#8 0x6fef6c7a (D:\msys64\mingw32\bin\libstdc++-6.dll+0xb6c7a)
#9 0x021d0c1d (D:\msys64\mingw32\bin\clang++.exe+0x1dd0c1d)
#10 0x6fef6c7a (D:\msys64\mingw32\bin\libstdc++-6.dll+0xb6c7a)

現在議論は
https://github.com/Alexpux/MINGW-packages/issues/1669
https://github.com/yumetodo/string_split/issues/1
https://github.com/Alexpux/MINGW-packages/issues/1195
およびTwitterに分散しているが、興味のある人は見てくれ。

C++でPStade.Oven(pipeline)風味なstringのsplitを作ってみた

結論

初めに