More than 5 years have passed since last update.

C++はなぜコンパイルが遅いのか

Last updated at 2019-03-05Posted at 2018-10-10

はじめに

念のため述べておきますが、この記事はC++という言語を批判するものではありません。
私自身も普段はC++のコードを書く人間です。

この記事の目的は、C++のコンパイルプロセスにおいて、コンパイル時間が長くなるような事例・原因を調査し、今後のC++の開発効率の改善に役立てるというものです。

例えば、宣言と実装を分けて予め共有ライブラリとしてビルドするようなライブラリを開発する際、コンパイル時間の増大の原因をきちんと理解していなければ、コンパイル時間を上手く削減することが出来ません。

C vs C++

一般に、C++で書かれたコードは、Cよりもコンパイル時間が長くなります。その理由は主に5つあります。

C++の文法の複雑さ故、ソースコードのparse(字句解析、構文解析、および意味解析)に時間を要している
テンプレートの実体化に時間を要する
複雑な最適化を施す必要がある
標準ライブラリの機能が豊富すぎて、インクルードするだけでコード量が爆発的に増える
ヘッダーに実装が書かれていることが多い

また、最近ではシングルファイルで書かれたライブラリが増えていることもあり、ヘッダーをきちんと分割せず一つのファイルにまとめてしまっていることもコンパイル時間の増加の一因になっています。

実験

※これらの実験結果はg++ 7.3.0に基づいたものであり、他のコンパイラで同様の結果が出ることを保証するものではありません。

各フェースごとの要する時間

gccには、-ftime-reportというオプションがサポートされています。コンパイル時にこのオプションを与えることによって、コンパイル中のどの段階でコンパイルに時間がかかっているのかを示してくれます。

例:

$ g++ -c -ftime-report main.cpp

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.02 ( 2%) wall    1495 kB ( 4%) ggc
 phase parsing           :   0.45 (80%) usr   0.25 (83%) sys   0.70 (81%) wall   34118 kB (80%) ggc
 phase lang. deferred    :   0.05 ( 9%) usr   0.03 (10%) sys   0.07 ( 8%) wall    4314 kB (10%) ggc
 phase opt and generate  :   0.06 (11%) usr   0.01 ( 3%) sys   0.07 ( 8%) wall    2533 kB ( 6%) ggc
 |name lookup            :   0.10 (18%) usr   0.08 (27%) sys   0.12 (14%) wall    2471 kB ( 6%) ggc
 |overload resolution    :   0.05 ( 9%) usr   0.01 ( 3%) sys   0.06 ( 7%) wall    3611 kB ( 9%) ggc
 dump files              :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 callgraph construction  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     224 kB ( 1%) ggc
 df live regs            :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing           :   0.07 (13%) usr   0.06 (20%) sys   0.14 (16%) wall    1265 kB ( 3%) ggc
 parser (global)         :   0.14 (25%) usr   0.06 (20%) sys   0.22 (26%) wall   12504 kB (29%) ggc
 parser struct body      :   0.06 (11%) usr   0.02 ( 7%) sys   0.09 (10%) wall    6126 kB (14%) ggc
 parser enumerator list  :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      67 kB ( 0%) ggc
 parser function body    :   0.03 ( 5%) usr   0.02 ( 7%) sys   0.05 ( 6%) wall    2107 kB ( 5%) ggc
 parser inl. func. body  :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.03 ( 3%) wall    1142 kB ( 3%) ggc
 parser inl. meth. body  :   0.06 (11%) usr   0.04 (13%) sys   0.05 ( 6%) wall    3062 kB ( 7%) ggc
 template instantiation  :   0.13 (23%) usr   0.07 (23%) sys   0.19 (22%) wall   12155 kB (29%) ggc
 inline parameters       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      34 kB ( 0%) ggc
 tree CFG construction   :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      69 kB ( 0%) ggc
 expand vars             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall      19 kB ( 0%) ggc
 integrated RA           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1263 kB ( 3%) ggc
 LRA non-specific        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       9 kB ( 0%) ggc
 LRA virtuals elimination:   0.01 ( 2%) usr   0.01 ( 3%) sys   0.00 ( 0%) wall      14 kB ( 0%) ggc
 final                   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      92 kB ( 0%) ggc
 initialize rtl          :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      12 kB ( 0%) ggc
 rest of compilation     :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     125 kB ( 0%) ggc
 TOTAL                 :   0.56             0.30             0.86              42472 kB

このレポート結果を用いて、実際にどの部分に時間がかかっているのかを調べてみます。

まずはテンプレートの実体化を大量に行う以下のコードで実験してみます。

example1.cpp

#include <cstdint>
#include <vector>
#include <array>
#include <list>
#include <set>
#include <unordered_set>

template <typename Tp>
void instantiation() {
  std::array<Tp> a = {1,3,2,4,5};
  std::vector<Tp> v(a.cbegin(), a.cend());
  std::list<Tp> l(a.cbegin(), a.cend());
  std::set<Tp> s(a.cbegin(), a.cend());
}

int main() {
  instantiation<std::uint8_t>();
  instantiation<std::uint16_t>();
  instantiation<std::uint32_t>();
  instantiation<std::uint64_t>();
  instantiation<std::int8_t>();
  instantiation<std::int16_t>();
  instantiation<std::int32_t>();
  instantiation<std::int64_t>();
  instantiation<float>();
  instantiation<double>();
}

結果は、以下のようになりました。

Phase	elapsed time [s]
setup	0.01
parsing	0.77
lang. deferred	0.68
opt and generate	1.63

ここで、lang. deferredというフェーズは、名前参照とオーバーロード解決を表し、opt and generateは最適化、およびバイナリの生成を含みます。
細かい分類で見ると、テンプレートの実体化にかかった時間は0.8秒でした。

つまり、実体化に要した時間はparsingにかかった時間と大差ありません。
実は、標準ライブラリが膨大であるが故、parsingやlang. deferredにかかる時間がかなり増大していることも、コンパイル時間の増加の要因になっているのです。

今度は皆大好きテンプレートメタ関数に関する実験です。
標準で用意されているメタ関数を呼び出しまくってみましょう。

example2.cpp

#include <cstdint>
#include <type_traits>

template <typename Tp>
void instantiation() {
  std::is_integral<Tp>::value;
  std::is_signed<Tp>::value;
  std::is_floating_point<Tp>::value;
  std::is_pod<Tp>::value;
  std::is_literal_type<Tp>::value;
  std::is_empty<Tp>::value;
  std::is_polymorphic<Tp>::value;
  std::is_abstract<Tp>::value;
  std::is_constructible<Tp>::value;
  std::is_copy_constructible<Tp>::value;
  std::is_move_constructible<Tp>::value;
  std::is_destructible<Tp>::value;
  std::is_copy_assignable<Tp>::value;
  std::is_move_assignable<Tp>::value;
}

int main() {
  instantiation<std::uint8_t>();
  instantiation<std::uint16_t>();
  instantiation<std::uint32_t>();
  instantiation<std::uint64_t>();
  instantiation<std::int8_t>();
  instantiation<std::int16_t>();
  instantiation<std::int32_t>();
  instantiation<std::int64_t>();
  instantiation<float>();
  instantiation<double>();
}

結果は以下のようになりました。

Phase	elapsed time [s]
setup	0.01
parsing	0.06
lang. deferred	0.07
opt and generate	0.01

なんとこれは意外な結果に終わりました。type_traits系はコンパイル時間に影響するものと思っていましたが、実際にはほとんど影響しないようです。

標準ライブラリのparsing

実験によって標準ライブラリのparsingに時間がかかるということが分かりましたので、もう少し調査を進めてみます。

C++の標準ライブラリを一つずつ指定してインクルードしたコードを作ります。例えばiostreamの場合は以下のようなコードを生成します。

#include <iostream>
int main(){return 0;}

これらのソースファイルのコンパイル時間を計測し、それぞれのヘッダーでparsingにどの程度の時間がかかるのかを調べました。
その結果が以下の表になります。

順位	ヘッダー	コンパイル時間[s]
1	regex	0.963
2	filesystem	0.877
3	future	0.758
4	complex	0.639
5	functional	0.614
6	random	0.613
7	iomanip	0.589
8	iostream	0.532
9	locale	0.527
10	fstream	0.523
11	shared_mutex	0.522
12	condition_variable	0.517
13	thread	0.514
14	unordered_set	0.503
15	unordered_map	0.503
16	sstream	0.488
17	iterator	0.478
18	istream	0.474
19	memory	0.456
20	ostream	0.456
21	mutex	0.454
22	map	0.453
23	ios	0.439
24	valarray	0.414
25	set	0.411
26	streambuf	0.399
27	scoped_allocator	0.387
28	tuple	0.380
29	optional	0.375
30	system_error	0.367
31	bitset	0.359
32	array	0.355
33	stdexcept	0.354
34	string	0.351
35	cmath	0.274
36	queue	0.240
37	vector	0.207
38	algorithm	0.204
39	stack	0.188
40	deque	0.180
41	string_view	0.178
42	variant	0.177
43	forward_list	0.176
44	list	0.164
45	chrono	0.158
46	atomic	0.136
47	any	0.128
48	charconv	0.123
49	utility	0.113
50	new	0.108
51	ratio	0.108
52	exception	0.106
53	numeric	0.103
54	type_traits	0.098
55	limits	0.090
56	cstdlib	0.085
57	cstdio	0.083
58	csignal	0.081
59	iosfwd	0.081
60	ctime	0.081
61	cuchar	0.081
62	cstddef	0.080
63	cwctype	0.080
64	cstring	0.079
65	cassert	0.079
66	typeinfo	0.079
67	cstdarg	0.079
68	clocale	0.079
69	typeindex	0.079
70	cctype	0.077
71	climits	0.077
72	cerrno	0.077
73	cinttypes	0.077
74	cwchar	0.076
75	cstdint	0.076
76	cfenv	0.075
77	cfloat	0.074
78	initializer_list	0.073
79	csetjmp	0.073
80	(no header)	0.072

これを見ると、regexやfunctionalなど、複雑なテクニックを用いていてコード量が多いものはやはり上位にランクインしていることが分かります。逆に、anyやinitializer_listなど実装がシンプルでコードも少ないものに関してはランクが下に来やすいということも分かります。

また、iostreamはかなり時間がかかっていますが、前方宣言のみ行うiosfwdはほとんど時間を要していないことも分かります。

ところで、regexのparsingにはかなりの時間がかかっていますが、いったいどれくらいのコード量になっているのでしょう。

g++にはプリプロセッサだけを処理する-Eオプションがありますので、これを使ってプリプロセス後のコードサイズを調べてみます。

example3.cpp

#include <regex>

$ g++ -E example3.cpp |wc
  65801  143592 1627254

なんと、トータルの行数は65801行、サイズは1.55MBという結果になりました。

gccによる実装を見ると、regexというヘッダーの中でさらに他のヘッダーをたくさん読み込んでいることが分かります。

/usr/include/c++/7/regex

#include <algorithm>
#include <bitset>
#ifdef _GLIBCXX_DEBUG
# include <iosfwd>
#endif
#include <iterator>
#include <locale>
#include <memory>
#include <sstream>
#include <stack>
#include <stdexcept>
#include <string>
#include <utility>
#include <vector>
#include <map>
#include <cstring>

コンパイル時間削減のために

コンパイル時間を削る方法はいくつかあるのですが、やはりヘッダーにincludeを書かないということが最も重要ではないでしょうか。

特に標準ライブラリはparsingに時間がかかるので、比較的コードが分割されているboostを使うか、algorithmとかくらいなら自分で実装してしまうのが良いでしょう。

あるいは、そもそもSTLとのインターフェースを実装しないというのも一つの手だとは思います。
例えば、指定したパスがディレクトリを指すかどうかを表す関数is_dirを実装したいとします。

bool is_dir(const std::string& s);

このコードを書くためだけに<string>ヘッダをincludeするのは少々無駄が多いので、代わりに以下の2つのインターフェースを用意するという方法です。

inline bool is_dir(const char* str) { is_dir(str, strlen(str)); }

bool is_dir(const char* str, std::size_t length);

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up