Kotlin/JVM でテストメソッド名に空白を含められる理由

Last updated at 2019-12-23Posted at 2019-12-23

はじめに

Kotlin にはバッククォートで囲うことで、メソッド名などに予約語や通常使用できない文字を含められるという言語機能があります。
例えばJavaで定義したメソッド名・フィールド名がKotlinの予約語に含まれる場合や、テストコードのメソッド名としてこの機能を使うことが多いかと思います。

// in はKotlinでは予約語なのでバッククォートで囲う必要あり
val text = System.`in`.reader().readText()

// コメントを使わず何のテストかわかりやすく書ける
@Test
fun `helloWorld should print "Hello World"`() {
    ...
}

しかし、前者はともかく後者はJavaでも定義できないメソッド名です。なぜこのようなメソッド名を定義できるのでしょうか？

JVM の仕様

まず JVM の仕様を見ていきましょう。
Class ファイルの仕様によると、 Class ファイルの構造は以下のように定義されています。 (C言語の構造体に似た文法ですね)

ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

この中でメソッド名の情報を持つのは constant_pool[] です。一見 field_info 型や method_info 型に情報を持たせていそうですが、実際は以下のように constant_pool[] 内で定義された UTF-8 の文字列を参照しているだけです。

field_info {
    u2             access_flags;
    u2             name_index; // constant_poolで定義された文字列の配列のindex
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

method_info {
    u2             access_flags;
    u2             name_index; // constant_poolで定義された文字列の配列のindex
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

constant_pool[] は cp_info 型の配列になっています。 cp_info は汎用的な構造となっており、 tag の値によって構造の解釈を変えます。メソッド名は CONSTANT_Utf8_info と名付けられた構造で格納されます。

cp_info {
    u1 tag;
    u1 info[];
}

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;        // info[] の先頭2バイトをlengthとして解釈
    u1 bytes[length]; // info[] の3バイト目以降をbytesとして解釈
}

名前の通り、 CONSTANT_Utf8_info 型では UTF-8 の文字列を格納できます。
重要なのは、 bytes に格納される値には制限がないことです。なぜならば、 CONSTANT_Utf8_info はメソッド名などだけでなく、 String 型定数にも参照される構造だからです。
そのため、実はメソッドやフィールド名にはほぼ全ての文字を利用可能です。

ただし、 CONSTANT_Utf8_info の仕様上は定義できても、メソッド名などとしては利用できない文字が数種定義されています。

Names of methods, fields, local variables, and formal parameters are stored as unqualified names. An unqualified name must contain at least one Unicode code point and must not contain any of the ASCII characters . ; [ / (that is, period or semicolon or left square bracket or forward slash).
Method names are further constrained so that, with the exception of the special method names <init> and <clinit> (§2.9), they must not contain the ASCII characters < or > (that is, left angle bracket or right angle bracket).

要約すると、

メソッド名、フィールド名、ローカル変数名、引数名は1文字以上でなければならず、
. ; [ / は利用することはできず、
さらにメソッド名の場合のみ < > も利用できない

と定義されています。

Kotlin の実装

Kotlin/JVM では、バッククォートを用いても . ; [ ] / < > : \ を利用できないようになっています。

// See The Java Virtual Machine Specification, section 4.7.9.1 https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.9.1
private val CHARS = setOf('.', ';', '[', ']', '/', '<', '>', ':', '\\')

JVMの定義より多くの文字が利用不可とされていますね。これはメソッド名以外にも全ての箇所に対して適用されるルールのようなので、Signatureで利用できない : なども含め、多めに例外設定されているものと思われます。

まとめ

Kotlin のバッククォートによって通常利用できない文字を使うテクニックは、JVMの仕様的には問題ないものであることがわかったかと思います。
個人的には、JVMの仕様書を初めて真面目に読んだので勉強になりました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up