More than 5 years have passed since last update.

Rust0.11で作るLLVMコンパイラ

Last updated at 2014-07-27Posted at 2014-07-27

はじめに

以前、Rust0.7でコンパイラを作成してみたのですが、Rust0.11までバージョンが上がり、動かなくなってしまいました。開発中の言語の変更なので仕方ないのですが、Rustを参考にしたいのに、使い方が良くわからないと困るのでRust0.11に対応してみました。

エラーワーニング対策

マクロ、use,EAddという大文字で始まる関数を許可する為に以下のアトリビュートを使います。

# ![feature(macro_rules)]
# ![feature(globs)]
# ![allow(non_snake_case_functions)]

メイン関数

以前、~aと書いていた、~はboxと書くようになりました。boxと書くとヒープに値を取り、ポインタを持つようになります。
println!は出力マクロですが、{}に引数を展開します。

interp::evalがインタプリタで、virtuallがASTから仮想LLVM命令へのコンパイラ、emitがファイル出力、execでllcを呼び出し、コンパイル実行しています。

fn main() {
  use ast::*;
  let ast = EBlock( Tv, vec![
    EPrint(Ti(32), box ELdc( Ti(32), 11)),
    EPrint(Ti(32),
      box EAdd(Ti(32), box ELdc( Ti(32), 11), box ELdc( Ti(32), 22)))
  ]);
  println!("ast={}", ast);
  interp::eval(&ast);

  let vs = virtuall(&ast);
  println!("vs={}",vs);

  emit("e.ll", &vs);

  println!("{}",exec("llc e.ll -o e.s"));
  println!("{}",exec("llvm-gcc -m64 e.s -o e"));
  println!("{}",exec("./e"));
}

抽象構文木AST

構文木はastモジュールでまとめて宣言しています。

pub mod ast {

  #[deriving(Clone,Show)]
  pub enum E {
    ELdc(T, int),
    EBin(T, String, Box<E>, Box<E>),
    EPrint(T, Box<E>),
    EBlock(T, Vec<E>),
  }

ELdc,EBin,EPrint,EBlockの4つの型を定義しています。Rustのenumは関数型言語の代数データ型に対応する事が出来ます。再帰的に型を参照する場合は、ポインタを参照するようにしないと、サイズが決まらないのでBoxを使う必要があります。しかし再帰的な参照でなければBoxを使わずに定義出来ます。Vecはベクターですが、ベクターにしている場合もBoxを使わずに定義出来ます。もちろん、Boxを使おうと思えば使えます。
#[deriving(Clone,Show)]を使って、cloneとフォーマットした文字列出力を自動生成しています。

  macro_rules! EOp(
    ($T:ident, $op:expr) => (
      pub fn $T(t: T, a: Box<E>, b: Box<E>) -> E {
        EBin(t, String::from_str($op), a, b)
      }
    )
  )

  EOp!(EAdd, "add")
  EOp!(ESub, "sub")
  EOp!(EMul, "mul")
  EOp!(EDiv, "div")

macro_ruesを使ってマクロを定義し、EAdd,ESub,EMul,EDiv関数を定義しています。

  #[deriving(Clone,Show,PartialEq)]
  pub enum T {
    Ti(int),
    Tv,
    TFun(Box<T>, Vec<T>),
  }

型を表す型Tの定義です。ParticalEqで比較命令を実装しています。

  #[deriving(Clone,Show)]
  pub enum R {
    RG(T, String),
    RL(T, String),
    RR(T, String),
    RN(T, String),
  }

LLVMのレジスタを表すRを定義しています。

  #[deriving(Clone,Show)]
  pub enum V {
    VCall(Option<R>, R, Vec<R>),
    VBin(Option<R>, String, R, R),
  }

LLVMの仮想命令のVを定義しています。

これ以降が、動作の定義です。

  impl R {
    pub fn t(&self) -> T {
      match *self {
        RG(ref t, _) => t.clone(),
        RL(ref t, _) => t.clone(),
        RR(ref t, _) => t.clone(),
        RN(ref t, _) => t.clone(),
      }
    }
  }

Rの型を取得する関数です。

  pub trait P {
    fn p(&self) -> String;
  }

r.p()として、文字列化するインターフェイスの定義です。

  impl P for T {
    fn p(&self) -> String {
      match *self {
        Ti(ref i) => format!("i{}",i),
        Tv => String::from_str("void"),
        TFun(ref t, ref ls) => {
          let mut stack = String::new();
          for l in ls.iter() {
            stack = stack + l.p();
            stack.push_str(", ");
          }
          format!("{}({})*",t.p(), stack.as_slice().trim_chars(&[',', ' ']))
        },
      }
    }
  }

Tに対する、pの実装です。

  impl P for R {
    fn p(&self) -> String {
      match *self {
        RG(_,ref id) => format!("@{}", *id),
        RL(_,ref id) => format!("%{}", *id),
        RR(_,ref id) => format!("%.{}", *id),
        RN(_,ref id) => format!("{}", *id),
      }
    }
  }
}

Rに対する、pの実装です。

インタプリタ

interpモジュールでインタプリタを定義します。

mod interp {
  use ast::*;

  pub fn eval(e:&E)->int {
    match e {
      &ELdc(_, i) => i,
      &EBin(_, ref op, ref a, ref b) if op.equiv(&"add") => eval(*a) + eval(*b),
      &EBin(_, ref op, _, _) => fail!("operator {}",*op),
      &EPrint(_, ref e) => {
        let e = eval(*e);
        println!("{}",e);
        e
      }
      &EBlock(_, ref ls) => {
        fn f(ls:&[E], r:int)-> int {
          match ls {
            [] => r,
            [ref a, ..rest] => f(rest,eval(a))
          }
        }
        f(ls.as_slice(), 0)
      }
    }
  }
}

構文木Eを受け取って、実行し値をintで返します。EPrintがあれば出力を行います。matchを使ってELdc,EBin,EPrint,EBlockについてそれぞれの処理が書いてあります。

仮想命令変換

構文木を受け取って、LLVMの仮想命令に変換します。

この関数は出力するVのリストの状態を持たせた構造を実現しています。
Haskellならステートモナドを使う所です。

モジュール内static変数はbox化されたデストラクタが必要な変数を持てない。
関数内関数は外部の変数をキャプチャできず、クロージャを生成しない。
クロージャは再帰呼び出しが出来ない。

このような制限があるため、ここではstructを作って状態を持たせ、implで関数を実装しました。selfに状態を持ち、メソッド呼び出しを再帰的に呼び出して変換しています。

とても、オブジェクト指向的ですが、構造体のデータ自体はimmutableで、関数内の変数だけがmutableで書き換えていくスタイルになっています。なので、構造体がHaskellのステートモナドのデータになっているような感じな訳です。


fn virtuall(a: &ast::E) -> Vec<ast::V> {
  use ast::*;

  struct Virutal {
    ls:Vec<V>
  }
  impl Virutal {

    fn new() -> Virutal {
      Virutal{ls:Vec::new()}
    }

    fn gid(&mut self,t:&T)-> R {
      RR(t.clone(), genid(""))
    }

    fn add(&mut self, v:V) {
      self.ls.push(v);
    }

    fn f(&mut self, e: &E) -> R {
      match e {
        &EBin(ref t,ref op, ref a, ref b) => {
          let a = self.f(*a);
          let b = self.f(*b);
          let id = self.gid(t);
          if *t != a.t() || *t != b.t() {
            fail!(format!("type mismatch {}",*t));
          }
          self.add(VBin(Some(id.clone()), op.clone(), a, b));
          id
        }
        &ELdc(ref t, ref i) => RN(t.clone(), format!("{}",i)),
        &EPrint(ref t, ref a) => {
          let a = self.f(*a);
          if *t != a.t() {
            fail!(format!("type mismatch t={} ta={}", t, a.t()))
          }
          self.add(VCall(None, RG(TFun(box Tv, vec![t.clone()]), format!("print_{}", t.p())), vec![a.clone()]));
          a
        }
        &EBlock(_,ref ls) =>
          self.loop_block(ls.as_slice(), &RN(Tv, String::new()))
      }
    }

    fn loop_block(&mut self, ls:&[E],r:&R)->R {
      match ls {
        [] => r.clone(),
        [ref e, ..rest] => {
          let r = self.f(e);
          self.loop_block(rest,&r)
        }
      }
    }

    fn apply(a: &ast::E) -> Vec<ast::V> {
      let mut env = Virutal::new();
      env.f(a);
      env.ls
    }
  }

  Virutal::apply(a)
}

出力

仮想命令リストからファイルに文字列出力します。
Asmクラスを使って呼び出しています。Asmが文脈を持っているモナドになっている感じです。

fn emit(file: &str, vs: &Vec<ast::V>) {
  use ast::*;
  use asm::*;

  fn emitV(asm:&mut Asm, v: &V) {
    match v {
      &VCall(ref id, ref op, ref prms) => {
        let mut ps:String = String::new();

        for a in prms.iter() {
          ps = if ps == String::new() {
            format!("{} {}", a.t().p(), a.p())
          } else {
            format!("{}, {} {}", ps, a.t().p(), a.p())
          }
        }
        asm.o(id, &format!("call {} {}({}) nounwind", op.t().p(), op.p(), ps))
      }
      &VBin(ref id, ref op, ref a, ref b) => {
        asm.o(id, &format!("{} {} {}, {}", *op, a.t().p(), a.p(), b.p()))
      }
    }
  }

  let asm = &mut Asm::open(file);

  asm.p(&format!("@.str = private constant [4 x i8] c\"%d\\0A\\00\""));
  asm.p(&format!("define void @print_i32(i32 %a) nounwind ssp {}","{"));
  asm.p(&format!("entry:"));
  asm.__(&format!("call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %a) nounwind"));
  asm.__(&format!("ret void"));
  asm.p(&format!("{}","}"));
  asm.p(&format!("define void @print_i8(i8 %a) nounwind ssp {}","{"));
  asm.p(&format!("entry:"));
  asm.__(&format!("call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i8 %a) nounwind"));
  asm.__(&format!("ret void"));
  asm.p(&format!("{}","}"));

  asm.p(&format!("declare i32 @printf(i8*, ...) nounwind"));

  asm.p(&format!("define i32 @main() nounwind ssp {}","{"));
  asm.p(&format!("entry:"));

  for v in vs.iter() {
    emitV(asm,v);
  }

  asm.__(&format!("ret i32 0"));
  asm.p(&format!("{}","}"));

}

アセンブラ出力

アセンブラ出力クラスというかモナドというか、そんな感じの物の定義です。


pub mod asm {
  use std::io::*;
  use ast::*;
  pub struct Asm {
    file:Result<File,IoError>
  }
  impl Asm {
    pub fn open(file:&str) -> Asm {
      Asm{file:File::create(&Path::new(file))}
    }
    #[allow(unused_must_use)]
    fn println(&mut self,s:&String) {
      let ss = s.as_bytes();
      self.file.write(ss);
      self.file.write("\n".as_bytes());
    }
    pub fn __(&mut self,s:&String) {
      self.println(&format!("  {}",s));
    }
    pub fn p(&mut self,s:&String) {
      self.println(s);
    }
    pub fn o(&mut self, id: &Option<R>, out: &String) {
      match id {
        &Some(ref id) =>self.__(&format!("{} = {}", id.p(), out)),
        &None => self.__(out),
      }
    }
  }
}

id生成

これはIDを生成するだけの関数です。staticなintならmutableでも持てるので使っています。

fn genid(s:&str) -> String {
  static mut id:int = 0;
  unsafe {
    id += 1;
    format!("{}{}",s, id)
  }
}

プロセス実行

文字列を渡すとプロセスを実行して多値で返すだけの関数です。

fn exec(cmd:&str) -> (int, String, String) {

  use std::io::process::*;

  let mut cmds:Vec<&str> = cmd.split_str(" ").collect();
  let mut cmd = Command::new(cmds.shift().unwrap());

  for arg in cmds.iter() {
    cmd.arg(*arg);
  }

  let mut process = match cmd.spawn() {
    Ok(p) => p,
    Err(e) => fail!("failed to execute process: {}", e),
  };
  process.set_timeout(Some(10_000));
  let code = match process.wait().ok().unwrap() {
    ExitStatus(i) => i,
    ExitSignal(i) => i
  };

  let output = process.stdout.get_mut_ref().read_to_end();
  let error = process.stderr.get_mut_ref().read_to_end();

    (code,
    String::from_utf8(output.ok().unwrap()).ok().unwrap(),
    String::from_utf8(error.ok().unwrap()).ok().unwrap()
    )
}

コンパイル & 実行

$ rustc comp.rs
$ ./comp

これで、e.ll,e.s,eが出力されます。

まとめ

最近のRustのオブジェクト指向はモナド則はどうなのかって話は置いて起きますがHaskellの型クラスのようであり、モナドのようでもあります。今までのオブジェクト指向言語とは大分、Haskell的になっていますが、Haskellよりはオブジェクト指向言語に近いものになっていました。Haskellは遅延評価ゆえに、メモリ管理が裏でどうなっているかが分かり辛いのですが、Rustではメモリ管理は手動で行うので、明確です。リファレンスカウンタやGCを使おうと思えば使えますが、使わなくてもプログラムは組めてリークもしません。GCなしのHaskellをより一般的なオブジェクト指向の言語に似せたRustはなかなか素晴らしい言語のように思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up