More than 5 years have passed since last update.

rustc 1.10.0 (cfcb716cf 2016-07-03) stableで作るLLVMコンパイラ

Last updated at 2016-07-28Posted at 2016-07-28

/*

はじめに

以前、Rust0.7 [2]、Rust0.11 [1]でコンパイラを作成してみたのですが、久しぶりに、Rustを使いたくなり、以前作ったプログラムを動かそうとするとRust1.10.0までバージョンが上がっており、やっぱりコンパイルできませんでした。
開発中の言語の変更なので仕方ないのですが、以前書いたプログラムを参考にしたいのに、参考にできないと困るのでRust1.10.0に対応しました。
コメントを埋め込んであるので、コピってcomp.rsに保存してコンパイルすればコンパイルできます。

エラーワーニング対策

もう、マクロは嫌だということでマクロ使うのやめました。
stableバージョンだと、安定である代わりに便利な機能は使えないらしく、不安定な機能は使いません。以前は()でくくっていたものを{}でくくるようになったようです。

メイン関数

以前、~aと書いていた、~はboxと書くようになり、Box::new()と書くようになってました。途中経過は知りません。Box:new()と書くとヒープに値を取り、ポインタを持つようになります。
println!は出力マクロですが、{}に引数を展開します。Debug用の出力は{:?}で使い分けが必要です。

interp::evalがインタプリタで、virtuallがASTから仮想LLVM命令へのコンパイラ、emitがファイル出力、execでllcを呼び出し、コンパイル実行しています。

*/
fn main() {
  use ast::*;
  let ast = E::Block(T::V, vec![
    E::Print(T::I(32), Box::new(E::Ldc(T::I(32), 11))),
    E::Print(T::I(32),
      Box::new(E::Bin(T::I(32), "add".to_string(), Box::new(E::Ldc(T::I(32), 11)), Box::new(E::Ldc(T::I(32), 22)))))
  ]);
  println!("ast={:?}", ast);
  interp::eval(&ast);

  let vs = virtuall(&ast);
  println!("vs={:?}",vs);

  emit("e.ll", &vs);

  println!("{:?}",exec("llc e.ll -o e.s"));
  println!("{:?}",exec("llvm-gcc -m64 e.s -o e"));
  println!("{:?}",exec("./e"));
}
/*

enumはT::Iなどと書くようになったようなので、そうしました。
string化は.to_string()を使います。

抽象構文木AST

構文木はastモジュールでまとめて宣言しています。

*/
pub mod ast {

  #[derive(Clone,Debug)]
  pub enum E {
    Ldc(T, i32),
    Bin(T, String, Box<E>, Box<E>),
    Print(T, Box<E>),
    Block(T, Vec<E>),
  }
/*

E::Ldc,E::Bin,E::Print,E::Blockの4つの型を定義しています。Rustのenumは関数型言語の代数データ型に対応する事が出来ます。再帰的に型を参照する場合は、ポインタを参照するようにしないと、サイズが決まらないのでBoxを使う必要があります。しかし再帰的な参照でなければBoxを使わずに定義出来ます。Vecはベクターですが、ベクターにしている場合もBoxを使わずに定義出来ます。もちろん、Boxを使おうと思えば使えます。
#[deriving(Clone,Show)]は#[derive(Clone,Debug)]を使って、cloneとフォーマットした文字列出力を自動生成しています。Debugは{:?}でフォーマット出力します。

*/
  #[derive(Clone,Debug,PartialEq)]
  pub enum T {
    I(i32),
    V,
    Fun(Box<T>, Vec<T>),
  }
/*

型を表す型Tの定義です。ParticalEqで比較命令を実装しています。

*/
  #[derive(Clone,Debug)]
  pub enum R {
    G(T, String),
    L(T, String),
    R(T, String),
    N(T, String),
  }
/*

LLVMのレジスタを表すRを定義しています。

*/
  #[derive(Clone,Debug)]
  pub enum V {
    Call(Option<R>, R, Vec<R>),
    Bin(Option<R>, String, R, R),
  }
/*

LLVMの仮想命令のVを定義しています。

これ以降が、動作の定義です。

*/
  impl R {
    pub fn t(&self) -> T {
      match *self {
        R::G(ref t, _) => t.clone(),
        R::L(ref t, _) => t.clone(),
        R::R(ref t, _) => t.clone(),
        R::N(ref t, _) => t.clone(),
      }
    }
  }
/*

Rの型を取得する関数です。

*/
  pub trait P {
    fn p(&self) -> String;
  }
/*

r.p()として、文字列化するインターフェイスの定義です。

*/
  impl P for T {
    fn p(&self) -> String {
      match *self {
        T::I(ref i) => format!("i{}",i),
        T::V => "void".to_string(),
        T::Fun(ref t, ref ls) => {
          let mut stack = String::new();
          for l in ls {
            stack = stack + &l.p();
            stack.push_str(", ");
          }
          let x: &[_] = &[',', ' '];
          format!("{}({})*",t.p(), stack.trim_right_matches(x))
        }
      }
    }
  }
/*

Tに対する、pの実装です。

*/
  impl P for R {
    fn p(&self) -> String {
      match *self {
        R::G(_,ref id) => format!("@{}", *id),
        R::L(_,ref id) => format!("%{}", *id),
        R::R(_,ref id) => format!("%.{}", *id),
        R::N(_,ref id) => format!("{}", *id),
      }
    }
  }
}
/*

Rに対する、pの実装です。

インタプリタ

interpモジュールでインタプリタを定義します。

*/
mod interp {
  use ast::*;

  pub fn eval(e:&E)->i32 {
    match e {
      &E::Ldc(_, i) => i,
      &E::Bin(_, ref op, ref a, ref b) if op.eq(&"add") => eval(a) + eval(b),
      &E::Bin(_, ref op, _, _) => panic!("operator {}",*op),
      &E::Print(_, ref e) => {
        let e = eval(e);
        println!("{}",e);
        e
      }
      &E::Block(_, ref ls) => {
        fn f(ls:&[E], r:i32)-> i32 {
          match ls.len() {
            0 => r,
            _ => {
              f(&ls[1..],eval(&ls[0]))
            }
          }
        }
        f(ls.as_slice(), 0)
      }
    }
  }
}
/*

構文木Eを受け取って、実行し値をintで返します。EPrintがあれば出力を行います。matchを使ってELdc,EBin,EPrint,EBlockについてそれぞれの処理があります。文字列比較はeqを使うようになり、stable版で配列のパターンマッチは使えなくなったので、配列長でパターンマッチして自分でスライスして辻褄合わせてます。fail!はpanic!になった。

仮想命令変換

構文木を受け取って、LLVMの仮想命令に変換します。

この関数は出力するVのリストの状態を持たせた構造を実現しています。
Haskellならステートモナドを使う所です。

モジュール内static変数はbox化されたデストラクタが必要な変数を持てない。
関数内関数は外部の変数をキャプチャできず、クロージャを生成しない。
クロージャは再帰呼び出しが出来ない。

このような制限があるため、ここではstructを作って状態を持たせ、implで関数を実装しました。selfに状態を持ち、メソッド呼び出しを再帰的に呼び出して変換しています。

とても、オブジェクト指向的ですが、構造体のデータ自体はimmutableで、関数内の変数だけがmutableで書き換えていくスタイルになっています。なので、構造体がHaskellのステートモナドのデータになっているような感じな訳です。

*/
fn virtuall(a: &ast::E) -> Vec<ast::V> {
  use ast::*;

  struct Virutal {
    ls:Vec<V>
  }
  impl Virutal {

    fn new() -> Virutal {
      Virutal{ls:Vec::new()}
    }

    fn gid(&mut self,t:&T)-> R {
      R::R(t.clone(), genid(""))
    }

    fn add(&mut self, v:V) {
      self.ls.push(v);
    }

    fn f(&mut self, e: &E) -> R {
      match e {
        &E::Bin(ref t,ref op, ref a, ref b) => {
          let a = self.f(a);
          let b = self.f(b);
          let id = self.gid(t);
          if *t != a.t() || *t != b.t() {
            panic!(format!("type mismatch {:?}", t));
          }
          self.add(V::Bin(Some(id.clone()), op.clone(), a, b));
          id
        }
        &E::Ldc(ref t, ref i) => R::N(t.clone(), format!("{}",i)),
        &E::Print(ref t, ref a) => {
          let a = self.f(a);
          if *t != a.t() {
            panic!(format!("type mismatch t={:?} ta={:?}", t, a.t()))
          }
          self.add(V::Call(None, R::G(T::Fun(Box::new(T::V), vec![t.clone()]), format!("print_{}", t.p())), vec![a.clone()]));
          a
        }
        &E::Block(_,ref ls) =>
          self.loop_block(ls.as_slice(), &R::N(T::V, String::new()))
      }
    }

    fn loop_block(&mut self, ls:&[E],r:&R)->R {
      match ls.len() {
        0 => r.clone(),
        _ => {
          let r = self.f(&ls[0]);
          self.loop_block(&ls[1..],&r)
        }
      }
    }

    fn apply(a: &ast::E) -> Vec<ast::V> {
      let mut env = Virutal::new();
      env.f(a);
      env.ls
    }
  }

  Virutal::apply(a)
}
/*

出力

仮想命令リストからファイルに文字列出力します。
Asmクラスを使って呼び出しています。Asmが文脈を持っているモナドになっている感じです。

*/
fn emit(file: &str, vs: &Vec<ast::V>) {
  use ast::*;
  use asm::*;

  fn emit_v(asm:&mut Asm, v: &V) {
    match v {
      &V::Call(ref id, ref op, ref prms) => {
        let mut ps:String = String::new();

        for a in prms.iter() {
          ps = if ps == String::new() {
            format!("{} {}", a.t().p(), a.p())
          } else {
            format!("{}, {} {}", ps, a.t().p(), a.p())
          }
        }
        asm.o(id, &format!("call {} {}({}) nounwind", op.t().p(), op.p(), ps))
      }
      &V::Bin(ref id, ref op, ref a, ref b) => {
        asm.o(id, &format!("{} {} {}, {}", *op, a.t().p(), a.p(), b.p()))
      }
    }
  }

  let asm = &mut Asm::open(file);

  asm.p(&format!("@.str = private constant [4 x i8] c\"%d\\0A\\00\""));
  asm.p(&format!("define void @print_i32(i32 %a) nounwind ssp {}","{"));
  asm.p(&format!("entry:"));
  asm.__(&format!("call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %a) nounwind"));
  asm.__(&format!("ret void"));
  asm.p(&format!("{}","}"));
  asm.p(&format!("define void @print_i8(i8 %a) nounwind ssp {}","{"));
  asm.p(&format!("entry:"));
  asm.__(&format!("call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i8 %a) nounwind"));
  asm.__(&format!("ret void"));
  asm.p(&format!("{}","}"));

  asm.p(&format!("declare i32 @printf(i8*, ...) nounwind"));

  asm.p(&format!("define i32 @main() nounwind ssp {}","{"));
  asm.p(&format!("entry:"));

  for v in vs.iter() {
    emit_v(asm,v);
  }

  asm.__(&format!("ret i32 0"));
  asm.p(&format!("{}","}"));

}
/*

スネークケースがどうのと怒られたのでemitVはemit_vにしました。
int型はi32に変わったりもしてます。

アセンブラ出力

アセンブラ出力クラスというかモナドというか、そんな感じの物の定義です。

*/
pub mod asm {
  use std::error::Error;
  use std::io::prelude::*;
  use std::fs::File;
  use std::path::Path;
  use ast::*;

  pub struct Asm {
    file: File
  }
  impl Asm {
    pub fn open(file:&str) -> Asm {
      let path = Path::new(&file);
      match File::create(&path) {
          Err(why) => panic!("couldn't create {}: {}", file, why.description()),
          Ok(file) => Asm{file:file},
      }     
    }
    fn println(&mut self,s:&String) {
      match self.file.write(s.as_bytes()) {
          Err(why) => panic!("couldn't write to {}: {}", s, why.description()),
          Ok(_) => (),
      }
      match self.file.write(b"\n") {
          Err(why) => panic!("couldn't write to {}: {}", s, why.description()),
          Ok(_) => (),
      }
    }
    pub fn __(&mut self,s:&String) {
      self.println(&format!("  {}",s));
    }
    pub fn p(&mut self,s:&String) {
      self.println(s);
    }
    pub fn o(&mut self, id: &Option<R>, out: &String) {
      match id {
        &Some(ref id) =>self.__(&format!("{} = {}", id.p(), out)),
        &None => self.__(out),
      }
    }
  }
}
/*

エラー処理を無理やり書かせられる感じなので、書きました。

id生成

これはIDを生成するだけの関数です。staticなintならmutableでも持てるので使っています。

*/
fn genid(s:&str) -> String {
  static mut id:i32 = 0;
  unsafe {
    id += 1;
    format!("{}{}",s, id)
  }
}
/*

プロセス実行

文字列を渡すとプロセスを実行して多値で返すだけの関数です。

*/
fn exec(cmd:&str) -> (i32, String, String) {
  use std::process::*;
  let mut cmds:Vec<&str> = cmd.split(' ').collect();
  let mut cmd = Command::new(cmds.remove(0));
  for arg in cmds.iter() {
    cmd.arg(*arg);
  }
  let err = "failed to execute process";
  let output = cmd.output().expect(err);
  (cmd.status().expect(err).code().expect(err),
  String::from_utf8(output.stdout).expect(err),
  String::from_utf8(output.stderr).expect(err)
  )
}
/*

splitが文字になったとか、こまごまいろいろ変わってました。
こまごまといろいろ変わってるのですが、無理やりエラー出力をさせられたので素直に出力してます。

コンパイル & 実行

$ multirust run stable rustc comp.rs
$ ./comp

multirustを使ってインストールするといろいろなバージョンのrustに対応できるらしいので使ってみました。しかし、rustupというものが出てきたらしいので今度使うときはrustupを使うのかもしれません。

これで、e.ll,e.s,eが出力され実行されます。

まとめ

ここのところ、LLVM3.8あたりをC++から使ってみていて、moveセマンティックスを使ってどうのという、unique_ptrを使うことで所有権を考えたプログラムを書くことになって、これってRustだなぁ。Rustだよなぁ。と思いつつ実行時エラーを出しては戦っていたわけですが、Rustはその辺素晴らしいです。

OCamlからC++に移植だーっていうときに、shared_ptrを使えば良いのかもしれませんが、unique_ptrを使って高速に動作させたいというようなことがある場合、Rustで書いてからC++に移植すれば、まちがいなく動くということができるかもしれません。全てをRustで書かなくても、大体の流れをRustで書いて設計して、その方向性でC++を書くというやり方もありでしょう。

Rustは強い型付け言語だけあって、移植始めのときはエラーだらけでさっぱり動かなかったわけですが、型を合わせてエラーがなくなったら問題なく動きました。この辺型が強い言語は良いですね。セマンティックスはほぼ変わらずに、エラー処理を強化させられより良いプログラムにしつつ、シンタックスはかなり変わっているけど、バージョンの以降は型を合わせれば動くというのがRustの魅力でしょう。翻訳記事も新しいものが出ています。皆さんも是非使ってみてください。

リンク

[1] Rust0.11で作るLLVMコンパイラ
http://qiita.com/h_sakurai/items/70b93a14ce05f95ff4b0

[2] Rustで簡単なLLVMコンパイラ (Rust0.7)
http://qiita.com/h_sakurai/items/8d22c484179c48552abe

*/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up