More than 3 years have passed since last update.

(ソースコードメモ)UbuntuでMLIR入門

Last updated at 2022-04-19Posted at 2020-05-02

1. はじめに

MLIRとは、Multi Level Intermediate Representationの略である。ザクッというと、コンパイラのどの段階でも使える中間表現フレームワークである。このフレームワークが登場した理由は、深層学習の流行により様々なコンパイラが登場している。それらのコンパイラはそれぞれスクラッチから作られているため、作成に工数もかかり、品質もばらばらであった。この問題を解決するべく登場した。

主な利用例としては、以下がある

Fortran IR (flang)
TensorFlow graph

なお、TensorFlow Graphの場合、Dialectの定義コードは、llvm配下ではなく、tensorflow配下にある。registerDialectで探せる。
また、MLIRからTVMのRelayへの変換も提案されている。

1.1. コード量

チュートリアルとして提供されているToy言語のコード量は、最終形のCh7でも2.4K行と大変小さい。

github.com/AlDanial/cloc v 1.85  T=0.03 s (680.4 files/s, 150292.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                              8            350            651           1497
C/C++ Header                     7            243            314            865
CMake                            3              6              0             50
-------------------------------------------------------------------------------
SUM:                            18            599            965           2412
-------------------------------------------------------------------------------

しかしながら、MLIR全体のコード量は、120K行というフレームワークとなっている。

github.com/AlDanial/cloc v 1.85  T=0.73 s (1023.3 files/s, 250817.7 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
C++                                    305          15481          23995          80922
C/C++ Header                           250           6675          13209          20129
Markdown                                41           3604              0          14099
CMake                                  136            395             74           2763
SVG                                      2              0              0            960
Python                                   5            263            346            734
Windows Module Definition                2             12              0            148
vim script                               4             30             43            131
JSON                                     1              1              0            112
Bourne Shell                             3             21             43             39
Lisp                                     1             11             31             37
YAML                                     2              0              1             20
---------------------------------------------------------------------------------------
SUM:                                   752          26493          37742         120094
---------------------------------------------------------------------------------------

1.2. 環境構築

Ubuntu 19.10 では、以下の手順でコンパイルできる。ただし。masterブランチだとコンパイルが成功しないときがある。以下の手順は、以下のパッチでコンパイルができた。

llvmorg-11-init　　 (2020/01/15)
f50bc823fe6f4279eb2f426dd54f3151878c0216 (2020/04/23)
35cf2f42dda4d708741e06570b2dbe91cec4dc41 (2020/04/22)
fa284e136e1b67e233f445fcf643eeaa10d6835c (2020/04/20)

ただし、以下ではコンパイルできなかった。

1811061c387baeff59446a090890368da3d86d42 (20202/04/21)
- 2eda87dfbe63bae43b81b22c8c76a3139147797b のパッチのためコンパイルできなくなっていると思われる。

$ sudo apt update
$ sudo apt -y upgrade
$ sudo apt install -y cmake
$ sudo apt install -y ninja-build
$ sudo apt install -y g++
$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build; cd build
$ cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS=mlir \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON
$ cmake --build . --target check-mlir

テスト結果(fa284e136e1b67e233f445fcf643eeaa10d6835c)は、以下の通りである。

-- Testing: 453 tests, 8 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

Testing Time: 2.53s
  Unsupported Tests:  15
  Expected Passes  : 438

AWS EC2 t3.2xlargeの実行時間結果は以下の通りである。

real    16m14.676s
user    122m37.133s
sys     6m5.102s

1.3. テスト実行

テスト実行としてmlir/test/mlir-cpu-runner配下のテストコードを実行できる。ただし、mlir-optによる変換が必要な場合があり、ソースコードのヘッダーを確認する必要がある。ここでテストするにあたり、以下のディレクトリを前提とする。

$ cd llvm-project/build

simple.mlirは、MLIRのLLVM Dialect(方言)のサンプルコードとなる。そして、LLVM Dialectは、mlir-cpu-runnerで、実行する。ここで、実行関数のデフォルトは、mainである。しかし、-eで別関数名を指定することもできる例えば、-e fooとして別関数を実行できる。

$ bin/mlir-cpu-runner ../mlir/test/mlir-cpu-runner/simple.mlir

sgemm_native_codegen.mlirは、mlir-optでMLIRのAffine Dialect(方言)やLinalg Dialect(方言)からLLVM Dialect(方言)へと変換する。そして、mlir-cpu-runnerで実行する。

$ bin/mlir-opt -convert-linalg-to-loops -lower-affine -convert-loop-to-std -convert-std-to-llvm ../mlir/test/mlir-cpu-runner/sgemm_naive_codegen.mlir | bin/mlir-cpu-runner -O3 -e main -entry-point-result=void -shared-libs=lib/libmlir_runner_utils.so

unranked_memref.mlirは、Linalg Dialect(方言)で記述している。このため、LLVM Dialect(方言)へ変換して、実行する。

$ bin/mlir-opt -convert-linalg-to-loops -convert-loop-to-std -convert-std-to-llvm ../mlir/test/mlir-cpu-runner/unranked_memref.mlir | bin/mlir-cpu-runner -O3 -e main -entry-point-result=void -shared-libs=lib/libmlir_runner_utils.so

utils.mlirは、Linalg Dialect(方言)で記述している。このため、LLVM Dialect(方言)へ変換して、実行する。ここで、複数の関数を指定できるので、-e print_0dや-e print_2dとすることもできる。

$ bin/mlir-opt -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm ../mlir/test/mlir-cpu-runner/utils.mlir | bin/mlir-cpu-runner -O3 -e print_1d -entry-point-result=void -shared-libs=lib/libmlir_runner_utils.so

関数は、以下のように定義する。ここで、@print_0dを呼び出す。

func @print_0d() {
  %f = constant 2.00000e+00 : f32
  %A = alloc() : memref<f32>
  store %f, %A[]: memref<f32>
  %U = memref_cast %A :  memref<f32> to memref<*xf32>
  call @print_memref_f32(%U): (memref<*xf32>) -> ()
  dealloc %A : memref<f32>
  return
}

2. チュートリアル

Toyコンパイラは、ToyAST/MLIR/LLVMIRそしてマシン語へと変換する。そして、Toyコンパイラのソースコードは、mlir/examples/toyにある。この節の最後でソースコード構成を説明する。
なお、MLIRの中では、複数のDialect(方言)に変換する。例えば、Toy, AffineそしてLLVMIRである。このように変換していき最終的に機械語に変換する。また、最適化は、Dialect上の各Pass上の処理(Canonicalize, Inliner)で行っている。

2.1. 第一章 Toy言語からASTまで

Toy言語は、テンソルベースで演算ができる言語である。簡単化のため、以下の仕様である。

2次元以内のテンソル
データ型は64bitの浮動小数点

ビルドイン関数は、以下の2つを定義している。

transpose
print

$ bin/toyc-ch1 ../mlir/test/Examples/Toy/Ch1/ast.toy -emit=ast

2.2. 第二章 ASTからMLIRまで

前章のAST(抽象構文木)に続いて、MLIR(ML中間表現)への変換を行う。このため、Toy Dialect(方言)の定義、Toy操作の定義を行い、コードの生成を行う。
まず、toycのdumpMLIRでToy Dialectの定義およびMLIRの生成呼び出しを行う。ここで、Toy Dialectの登録(register)を行い、mlirGenでMLIR生成を行う。

int dumpMLIR() {
  // Register our Dialect with MLIR.
  mlir::registerDialect<mlir::toy::ToyDialect>();

  mlir::MLIRContext context;

  // Handle '.toy' input to the compiler.
  if (inputType != InputType::MLIR &&
      !llvm::StringRef(inputFilename).endswith(".mlir")) {
    auto moduleAST = parseInputFile(inputFilename);
    if (!moduleAST)
      return 6;
    mlir::OwningModuleRef module = mlirGen(context, *moduleAST);
    if (!module)
      return 1;

    module->dump();
    return 0;
  }

  // Otherwise, the input is '.mlir'.
  llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> fileOrErr =
      llvm::MemoryBuffer::getFileOrSTDIN(inputFilename);
  if (std::error_code EC = fileOrErr.getError()) {
    llvm::errs() << "Could not open input file: " << EC.message() << "\n";
    return -1;
  }

  // Parse the input mlir.
  llvm::SourceMgr sourceMgr;
  sourceMgr.AddNewSourceBuffer(std::move(*fileOrErr), llvm::SMLoc());
  mlir::OwningModuleRef module = mlir::parseSourceFile(sourceMgr, &context);
  if (!module) {
    llvm::errs() << "Error can't load file " << inputFilename << "\n";
    return 3;
  }

  module->dump();
  return 0;
}

次に、Dialectの延長で、Operation Definition Specification (ODS)による演算子が定義される。コード上は、Ops.tdで定義している。以下で、ConstantOpの定義例まで示す。

//===----------------------------------------------------------------------===//
//
// Defines the operations of the Toy dialect.
//
//===----------------------------------------------------------------------===//

#ifndef TOY_OPS
#define TOY_OPS

include "mlir/IR/OpBase.td"
include "mlir/Interfaces/SideEffects.td"

// Provide a definition of the 'toy' dialect in the ODS framework so that we
// can define our operations.
def Toy_Dialect : Dialect {
  let name = "toy";
  let cppNamespace = "toy";
}

// Base class for toy dialect operations. This operation inherits from the base
// `Op` class in OpBase.td, and provides:
//   * The parent dialect of the operation.
//   * The mnemonic for the operation, or the name without the dialect prefix.
//   * A list of traits for the operation.
class Toy_Op<string mnemonic, list<OpTrait> traits = []> :
    Op<Toy_Dialect, mnemonic, traits>;

//===----------------------------------------------------------------------===//
// Toy Operations
//===----------------------------------------------------------------------===//

// We define a toy operation by inheriting from our base 'Toy_Op' class above.
// Here we provide the mnemonic and a list of traits for the operation. The
// constant operation is marked as 'NoSideEffect' as it is a pure operation
// and may be removed if dead.
def ConstantOp : Toy_Op<"constant", [NoSideEffect]> {
  // Provide a summary and description for this operation. This can be used to
  // auto-generate documentation of the operations within our dialect.
  let summary = "constant";
  let description = [{
    Constant operation turns a literal into an SSA value. The data is attached
    to the operation as an attribute. For example:

    ```mlir
      %0 = toy.constant dense<[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]>
                        : tensor<2x3xf64>
    ```
  }];

  // The constant operation takes an attribute as the only input.
  let arguments = (ins F64ElementsAttr:$value);

  // The constant operation returns a single value of TensorType.
  let results = (outs F64Tensor);

  // Specify a parser and printer method.
  let parser = [{ return ::parseConstantOp(parser, result); }];
  let printer = [{ return ::print(p, *this); }];

  // Add custom build methods for the constant operation. These method populates
  // the `state` that MLIR uses to create operations, i.e. these are used when
  // using `builder.create<ConstantOp>(...)`.
  let builders = [
    // Build a constant with a given constant tensor value.
    OpBuilder<"Builder *builder, OperationState &state, "
              "DenseElementsAttr value", [{
      build(builder, state, value.getType(), value);
    }]>,

    // Build a constant with a given constant floating-point value.
    OpBuilder<"Builder *builder, OperationState &state, double value">
  ];

  // Invoke a static verify method to verify this constant operation.
  let verifier = [{ return ::verify(*this); }];
}

ここで、新規に利用しているソースコードは、以下である。

mlir/
- Dialect.cpp
- MLIRGen.cc
include/toy/
- Ops.td

ここまでのコードは、以下で出力できる。

$ bin/toyc-ch2 ../mlir/test/Examples/Toy/Ch2/codegen.toy -emit=mlir

ソースコードの変更を1章と2章で比較すると以下の通り

 CMakeLists.txt             |   19 +
 include/CMakeLists.txt     |    1
 include/toy/CMakeLists.txt |    4
 include/toy/Dialect.h      |   45 ++++
 include/toy/Lexer.h        |    4
 include/toy/MLIRGen.h      |   32 +++
 include/toy/Ops.td         |  251 ++++++++++++++++++++++++
 mlir/Dialect.cpp           |  254 +++++++++++++++++++++++++
 mlir/MLIRGen.cpp           |  452 +++++++++++++++++++++++++++++++++++++++++++++
 toyc.cpp                   |   87 +++++++-
 10 files changed, 1136 insertions(+), 13 deletions(-)

2.3. 第三章 MLIRの高レベル最適化(1)

グラフレベル最適化の一つとして、層演算の融合がある。これは、MLIR独自のGeneric DAG Rewriter Infrastructureを使う。書き換え方法としては、2つあり、C++ベースと、DRRの2つがある。
最適化パスは、toyc.cppから呼び出しておりcreateCanonicalizePassである。コードとしては、ToyCombine.cppおよびToyCombine.tdが該当する。

    // Add a run of the canonicalizer to optimize the mlir module.
    pm.addNestedPass<mlir::FuncOp>(mlir::createCanonicalizerPass());

なお、それぞれのパス(Pass)では、runOnOperation関数が呼び出され処理が行われる。

C++ベースのRewritePattern

はじめに、RewritePatternを使った、書き換えを説明する。正規化を用いて、最適化を行うため、Ops.tdで定義したTransposeOpに対して、属性hasCanonicalizerをオンにする。

def TransposeOp : Toy_Op<"transpose", [NoSideEffect]> {
  let summary = "transpose operation";

  let arguments = (ins F64Tensor:$input);
  let results = (outs F64Tensor);

  let assemblyFormat = [{
    `(` $input `:` type($input) `)` attr-dict `to` type(results)
  }];

  // Enable registering canonicalization patterns with this operation.
  let hasCanonicalizer = 1;

  // Allow building a TransposeOp with from the input operand.
  let builders = [
    OpBuilder<"Builder *b, OperationState &state, Value input">
  ];

  // Invoke a static verify method to verify this transpose operation.
  let verifier = [{ return ::verify(*this); }];
}

次に、ToyCombine.cppにあるgetCanonilaizationPatternsを定義する。

/// Register our patterns as "canonicalization" patterns on the TransposeOp so
/// that they can be picked up by the Canonicalization framework.
void TransposeOp::getCanonicalizationPatterns(OwningRewritePatternList &results,
                                              MLIRContext *context) {
  results.insert<SimplifyRedundantTranspose>(context);
}

ToyCombine.cppにあるSimplifyRedundantTransposeを定義する。

// This is an example of a c++ rewrite pattern for the TransposeOp. It
/// optimizes the following scenario: transpose(transpose(x)) -> transpose(x)
struct SimplifyRedundantTranspose : public mlir::OpRewritePattern<TransposeOp> {
  /// We register this pattern to match every toy.transpose in the IR.
  /// The "benefit" is used by the framework to order the patterns and process
  /// them in order of profitability.
  SimplifyRedundantTranspose(mlir::MLIRContext *context)
      : OpRewritePattern<TransposeOp>(context, /*benefit=*/1) {}

  /// This method attempts to match a pattern and rewrite it. The rewriter
  /// argument is the orchestrator of the sequence of rewrites. The pattern is
  /// expected to interact with it to perform any changes to the IR from here.
  mlir::LogicalResult
  matchAndRewrite(TransposeOp op,
                  mlir::PatternRewriter &rewriter) const override {
    // Look through the input of the current transpose.
    mlir::Value transposeInput = op.getOperand();
    TransposeOp transposeInputOp =
        llvm::dyn_cast_or_null<TransposeOp>(transposeInput.getDefiningOp());

    // Input defined by another transpose? If not, no match.
    if (!transposeInputOp)
      return failure();

    // Otherwise, we have a redundant transpose. Use the rewriter.
    rewriter.replaceOp(op, {transposeInputOp.getOperand()});
    return success();
  }
};

DRRを使ったRewritePattern

テーブルベースの書き換え(Rewrite)によって行っている。その定義は、ToyCombine.tdで行われている。なお、Patは、include/mlir/IR/OpBase.tdで定義している。

//===----------------------------------------------------------------------===//
// Basic Pattern-Match and Rewrite
//===----------------------------------------------------------------------===//

// Reshape(Reshape(x)) = Reshape(x)
def ReshapeReshapeOptPattern : Pat<(ReshapeOp(ReshapeOp $arg)),
                                   (ReshapeOp $arg)>;

ここで、新規に使うコードは、以下である。

mlir/
- ToyCombine.cpp
- ToyCombine.td

以下のコマンドでコード作成ができる。

$ bin/toyc-ch3 ../mlir/test/Examples/Toy/Ch3/trivial_reshape.toy -emit=mlir -opt

ソースコードの変更を2章と3章で比較すると以下の通り

 CMakeLists.txt             |   19 +++++++++---
 include/toy/CMakeLists.txt |    2 -
 include/toy/Ops.td         |   15 ++++++---
 mlir/ToyCombine.cpp        |   69 +++++++++++++++++++++++++++++++++++++++++++++
 mlir/ToyCombine.td         |   62 ++++++++++++++++++++++++++++++++++++++++
 toyc.cpp                   |   50 ++++++++++++++++++++++----------
 6 files changed, 191 insertions(+), 26 deletions(-)

2.4. 第四章MLIRの高レベル最適化(2)

テンソルの形を考慮した最適化は(リージョン(Region)毎の)インライン展開で行う。toyc.cpp上の最適化で呼び出している

    // Inline all functions into main and then delete them.
    pm.addPass(mlir::createInlinerPass());

インライン展開フレームワークを使うため、DialectInlinerInterfaceの仮想フックに登録する。

//===----------------------------------------------------------------------===//
// ToyInlinerInterface
//===----------------------------------------------------------------------===//

/// This class defines the interface for handling inlining with Toy
/// operations.
struct ToyInlinerInterface : public DialectInlinerInterface {
  using DialectInlinerInterface::DialectInlinerInterface;

  //===--------------------------------------------------------------------===//
  // Analysis Hooks
  //===--------------------------------------------------------------------===//

  /// All operations within toy can be inlined.
  bool isLegalToInline(Operation *, Region *,
                       BlockAndValueMapping &) const final {
    return true;
  }

  //===--------------------------------------------------------------------===//
  // Transformation Hooks
  //===--------------------------------------------------------------------===//

  /// Handle the given inlined terminator(toy.return) by replacing it with a new
  /// operation as necessary.
  void handleTerminator(Operation *op,
                        ArrayRef<Value> valuesToRepl) const final {
    // Only "toy.return" needs to be handled here.
    auto returnOp = cast<ReturnOp>(op);

    // Replace the values directly with the return operands.
    assert(returnOp.getNumOperands() == valuesToRepl.size());
    for (const auto &it : llvm::enumerate(returnOp.getOperands()))
      valuesToRepl[it.index()].replaceAllUsesWith(it.value());
  }

  /// Attempts to materialize a conversion for a type mismatch between a call
  /// from this dialect, and a callable region. This method should generate an
  /// operation that takes 'input' as the only operand, and produces a single
  /// result of 'resultType'. If a conversion can not be generated, nullptr
  /// should be returned.
  Operation *materializeCallConversion(OpBuilder &builder, Value input,
                                       Type resultType,
                                       Location conversionLoc) const final {
    return builder.create<CastOp>(conversionLoc, resultType, input);
  }
};

上記のインターフェースをToyDialectにaddInterfacesで登録する。

//===----------------------------------------------------------------------===//
// ToyDialect
//===----------------------------------------------------------------------===//

/// Dialect creation, the instance will be owned by the context. This is the
/// point of registration of custom types and operations for the dialect.
ToyDialect::ToyDialect(mlir::MLIRContext *ctx) : mlir::Dialect("toy", ctx) {
  addOperations<
#define GET_OP_LIST
#include "toy/Ops.cpp.inc"
      >();
  addInterfaces<ToyInlinerInterface>();
}

次に、toy.generic_callを登録する。ここでは、CallOpInterfaceを登録する。Ops.tdで以下のようにCallOpInterfaceを定義する。

include "mlir/Interfaces/CallInterfaces.td"

def GenericCallOp : Toy_Op<"generic_call",
    [DeclareOpInterfaceMethods<CallOpInterface>]> {
  let summary = "generic call operation";
  let description = [{
    Generic calls represent calls to a user defined function that needs to
    be specialized for the shape of its arguments. The callee name is attached
    as a symbol reference via an attribute. The arguments list must match the
    arguments expected by the callee. For example:

    ```mlir
     %4 = toy.generic_call @my_func(%1, %3)
           : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
    ```

    This is only valid if a function named "my_func" exists and takes two
    arguments.
  }];

  // The generic call operation takes a symbol reference attribute as the
  // callee, and inputs for the call.
  let arguments = (ins FlatSymbolRefAttr:$callee, Variadic<F64Tensor>:$inputs);

  // The generic call operation returns a single value of TensorType.
  let results = (outs F64Tensor);

  // The return operation only emits the input in the format if it is present.
  let assemblyFormat = [{
    $callee `(` $inputs `)` attr-dict `:` functional-type($inputs, results)
  }];

  // Add custom build methods for the generic call operation.
  let builders = [
    OpBuilder<"OpBuilder &builder, OperationState &state, "
              "StringRef callee, ArrayRef<Value> arguments">
  ];
}

/// Return the callee of the generic call operation, this is required by the
/// call interface.
CallInterfaceCallable GenericCallOp::getCallableForCallee() {
  return getAttrOfType<SymbolRefAttr>("callee");
}

/// Return the callee of the generic call operation, this is required by the
/// call interface.
CallInterfaceCallable GenericCallOp::getCallableForCallee() {
  return getAttrOfType<SymbolRefAttr>("callee");
}

/// Get the argument operands to the called function, this is required by the
/// call interface.
Operation::operand_range GenericCallOp::getArgOperands() { return inputs(); }

新規に導入したソースコードは以下である。

mlir/
- ShapeInferencePass.cpp

関連するコードは以下である。

mlir/include/mlir/Interfaces/CallInterfaces.td

言語非依存最適化を行う。ここでは、Interfaceを用いて、最適化を行う。例えば、テンソル構造に関する最適化を行う。

$ bin/toyc-ch4 ../mlir/test/Examples/Toy/Ch4/codegen.toy -emit=mlir -opt

出力コードを比較すると以下の通りとなる。generic_callに関する呼び出しが削減されている。

--- norm.0      2020-04-30 15:20:47.077823958 +0000
+++ opt.0       2020-04-30 15:20:59.721346678 +0000
@@ -1,20 +1,11 @@


 module {
-  func @multiply_transpose(%arg0: tensor<*xf64>, %arg1: tensor<*xf64>) -> tensor<*xf64> attributes {sym_visibility = "private"} {
-    %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64>
-    %1 = toy.transpose(%arg1 : tensor<*xf64>) to tensor<*xf64>
-    %2 = toy.mul %0, %1 : tensor<*xf64>
-    toy.return %2 : tensor<*xf64>
-  }
   func @main() {
     %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
-    %1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64>
-    %2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64>
-    %3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64>
-    %4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
-    %5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
-    toy.print %5 : tensor<*xf64>
+    %1 = toy.transpose(%0 : tensor<2x3xf64>) to tensor<3x2xf64>
+    %2 = toy.mul %1, %1 : tensor<3x2xf64>
+    toy.print %2 : tensor<3x2xf64>
     toy.return
   }
 }

ソースコードの変更を3章と4章で比較すると以下の通り

 CMakeLists.txt                         |   13 ++-
 include/toy/CMakeLists.txt             |    7 +-
 include/toy/Dialect.h                  |    2
 include/toy/Ops.td                     |   35 +++++++++-
 include/toy/Passes.h                   |   26 +++++++
 include/toy/ShapeInferenceInterface.h  |   28 ++++++++
 include/toy/ShapeInferenceInterface.td |   30 ++++++++
 mlir/Dialect.cpp                       |   81 +++++++++++++++++++++++
 mlir/MLIRGen.cpp                       |    4 +
 mlir/ShapeInferencePass.cpp            |  113 +++++++++++++++++++++++++++++++++
 mlir/ToyCombine.cpp                    |    5 +
 toyc.cpp                               |   13 +++
 12 files changed, 345 insertions(+), 12 deletions(-)

2.5. 第五章低レベル言語への部分変換

ここでは、部分変換の例として、ToyからAffineへの変換を行う。アフィン演算(全結合演算)はMLIRのDialectの一つとして定義されている。このため、toyc.cppやその先で呼んでいるmlir/LowerToAffineLoops.cppでで、Affine演算に変換する。
toyc.cppでは以下の個所が相当する。Affineに変換後、LoopFusionやMemREfDataFlowOptの最適化が行われている。

  if (isLoweringToAffine) {
    // Partially lower the toy dialect with a few cleanups afterwards.
    pm.addPass(mlir::toy::createLowerToAffinePass());

    mlir::OpPassManager &optPM = pm.nest<mlir::FuncOp>();
    optPM.addPass(mlir::createCanonicalizerPass());
    optPM.addPass(mlir::createCSEPass());

    // Add optimizations if enabled.
    if (enableOpt) {
      optPM.addPass(mlir::createLoopFusionPass());
      optPM.addPass(mlir::createMemRefDataFlowOptPass());
    }
  }

MLIR Affineの効果は以下で確認できる。-optのありなしで確認する。

$ bin/toyc-ch5 ../mlir/test/Examples/Toy/Ch5/affine-lowering.mlir -emit=mlir-affine

差分を取ると以下となる。メモリの観点等で最適化されている。行頭に-が付いているのは、最適化前である。一方+が付いているのは、最適化後のコードである。

t$ diff -urpN none.0 opt.0
--- none.0      2020-04-30 01:26:01.689305461 +0000
+++ opt.0       2020-04-30 01:25:52.445313589 +0000
@@ -19,31 +19,22 @@ module {
     %cst_3 = constant 5.000000e+00 : f64
     %cst_4 = constant 6.000000e+00 : f64
     %0 = alloc() : memref<3x2xf64>
-    %1 = alloc() : memref<3x2xf64>
-    %2 = alloc() : memref<2x3xf64>
-    affine.store %cst, %2[0, 0] : memref<2x3xf64>
-    affine.store %cst_0, %2[0, 1] : memref<2x3xf64>
-    affine.store %cst_1, %2[0, 2] : memref<2x3xf64>
-    affine.store %cst_2, %2[1, 0] : memref<2x3xf64>
-    affine.store %cst_3, %2[1, 1] : memref<2x3xf64>
-    affine.store %cst_4, %2[1, 2] : memref<2x3xf64>
+    %1 = alloc() : memref<2x3xf64>
+    affine.store %cst, %1[0, 0] : memref<2x3xf64>
+    affine.store %cst_0, %1[0, 1] : memref<2x3xf64>
+    affine.store %cst_1, %1[0, 2] : memref<2x3xf64>
+    affine.store %cst_2, %1[1, 0] : memref<2x3xf64>
+    affine.store %cst_3, %1[1, 1] : memref<2x3xf64>
+    affine.store %cst_4, %1[1, 2] : memref<2x3xf64>
     affine.for %arg0 = 0 to 3 {
       affine.for %arg1 = 0 to 2 {
-        %3 = affine.load %2[%arg1, %arg0] : memref<2x3xf64>
-        affine.store %3, %1[%arg0, %arg1] : memref<3x2xf64>
-      }
-    }
-    affine.for %arg0 = 0 to 3 {
-      affine.for %arg1 = 0 to 2 {
-        %3 = affine.load %1[%arg0, %arg1] : memref<3x2xf64>
-        %4 = affine.load %1[%arg0, %arg1] : memref<3x2xf64>
-        %5 = mulf %3, %4 : f64
-        affine.store %5, %0[%arg0, %arg1] : memref<3x2xf64>
+        %2 = affine.load %1[%arg1, %arg0] : memref<2x3xf64>
+        %3 = mulf %2, %2 : f64
+        affine.store %3, %0[%arg0, %arg1] : memref<3x2xf64>
       }
     }
     toy.print %0 : memref<3x2xf64>
-    dealloc %2 : memref<2x3xf64>
-    dealloc %1 : memref<3x2xf64>
+    dealloc %1 : memref<2x3xf64>
     dealloc %0 : memref<3x2xf64>
     return
   }

ソースコードの変更を4章と5章で比較すると以下の通り

 CMakeLists.txt              |   17 +-
 include/toy/CMakeLists.txt  |    4
 include/toy/Ops.td          |    3
 include/toy/Passes.h        |    5
 mlir/LowerToAffineLoops.cpp |  316 ++++++++++++++++++++++++++++++++++++++++++++
 toyc.cpp                    |   39 ++++-
 6 files changed, 366 insertions(+), 18 deletions(-)

2.6. 第六章 LLVM IR(LLVM中間表現)へ変換

前章では、Dialectの変換例を示した。ここでは、LLVM IRへの変換例を示す。
toyc.cppで変換を呼び出す。

  if (isLoweringToLLVM) {
    // Finish lowering the toy IR to the LLVM dialect.
    pm.addPass(mlir::toy::createLowerToLLVMPass());
  }

そして、mlir/LowerToLLVM.cppで変換する。

//===----------------------------------------------------------------------===//
// ToyToLLVMLoweringPass
//===----------------------------------------------------------------------===//

namespace {
struct ToyToLLVMLoweringPass
    : public PassWrapper<ToyToLLVMLoweringPass, OperationPass<ModuleOp>> {
  void runOnOperation() final;
};
} // end anonymous namespace

void ToyToLLVMLoweringPass::runOnOperation() {
  // The first thing to define is the conversion target. This will define the
  // final target for this lowering. For this lowering, we are only targeting
  // the LLVM dialect.
  LLVMConversionTarget target(getContext());
  target.addLegalOp<ModuleOp, ModuleTerminatorOp>();

  // During this lowering, we will also be lowering the MemRef types, that are
  // currently being operated on, to a representation in LLVM. To perform this
  // conversion we use a TypeConverter as part of the lowering. This converter
  // details how one type maps to another. This is necessary now that we will be
  // doing more complicated lowerings, involving loop region arguments.
  LLVMTypeConverter typeConverter(&getContext());

  // Now that the conversion target has been defined, we need to provide the
  // patterns used for lowering. At this point of the compilation process, we
  // have a combination of `toy`, `affine`, and `std` operations. Luckily, there
  // are already exists a set of patterns to transform `affine` and `std`
  // dialects. These patterns lowering in multiple stages, relying on transitive
  // lowerings. Transitive lowering, or A->B->C lowering, is when multiple
  // patterns must be applied to fully transform an illegal operation into a
  // set of legal ones.
  OwningRewritePatternList patterns;
  populateAffineToStdConversionPatterns(patterns, &getContext());
  populateLoopToStdConversionPatterns(patterns, &getContext());
  populateStdToLLVMConversionPatterns(typeConverter, patterns);

  // The only remaining operation to lower from the `toy` dialect, is the
  // PrintOp.
  patterns.insert<PrintOpLowering>(&getContext());

  // We want to completely lower to LLVM, so we use a `FullConversion`. This
  // ensures that only legal operations will remain after the conversion.
  auto module = getOperation();
  if (failed(applyFullConversion(module, target, patterns, &typeConverter)))
    signalPassFailure();
}

/// Create a pass for lowering operations the remaining `Toy` operations, as
/// well as `Affine` and `Std`, to the LLVM dialect for codegen.
std::unique_ptr<mlir::Pass> mlir::toy::createLowerToLLVMPass() {
  return std::make_unique<ToyToLLVMLoweringPass>();
}

なお、toycコマンドでは、LLVM IRやJITへの変換することができる。以下では、JITへと出力しているがそれ以外にも、-emit=mlir、-emit=mlir-affine、-emit=mlir-llvmおよび-emit=llvmへの出力を設定できる。

$ echo 'def main() { print([[1, 2], [3, 4]]); }' | ./bin/toyc-ch6 -emit=jit
1.000000 2.000000
3.000000 4.000000

ソースコードの変更を5章と6章で比較すると以下の通り

 CMakeLists.txt              |   22 +++-
 include/toy/CMakeLists.txt  |    4
 include/toy/Ops.td          |    6 -
 include/toy/Passes.h        |    4
 mlir/LowerToAffineLoops.cpp |    1
 mlir/LowerToLLVM.cpp        |  204 ++++++++++++++++++++++++++++++++++++++++++++
 toyc.cpp                    |  132 +++++++++++++++++++++++-----
 7 files changed, 337 insertions(+), 36 deletions(-)

2.7. 第七章　言語の拡張

Toy(玩具)言語の拡張方法ついての説明

Defining Dialect Attributes and Types

関連しているソースコードは、以下の通り(Toy Ch6とCh7の差分)

 CMakeLists.txt              |   14 --
 include/toy/AST.h           |   90 +++++++++++--
 include/toy/CMakeLists.txt  |    4
 include/toy/Dialect.h       |   54 +++++++
 include/toy/Lexer.h         |   11 +
 include/toy/Ops.td          |   73 +++++++++-
 include/toy/Parser.h        |  258 +++++++++++++++++++++++++++++++++-----
 mlir/Dialect.cpp            |  264 +++++++++++++++++++++++++++++++++++---
 mlir/LowerToAffineLoops.cpp |    1
 mlir/MLIRGen.cpp            |  298 ++++++++++++++++++++++++++++++++++++++------
 mlir/ToyCombine.cpp         |   18 ++
 parser/AST.cpp              |   49 ++++++-
 toyc.cpp                    |    1
 13 files changed, 1002 insertions(+), 133 deletions(-)

および、mlir本体の関連コードは以下

mlir/include/mlir/IR/DialectSymbolRegistry.def

2.8. ソースコード構成

Toyコンパイラのソースコードは、mlir/examples/toyにある。そして、その構成は以下のとおりである。

CMakeLists.txt
toyc.cpp　（Ch1-7) コンパイラ本体
include/
- toy/
  - CMakeLists.txt
  - AST.h　(Ch1-7) Toy言語AST変換ヘッダ
  - Dialect.h (Ch2-7) Dialect用ヘッダ
  - Lexer.h (Ch1-7) 字句解析ヘッダ
  - MLIRGen.h (Ch2-7) MLIR生成用ヘッダ
  - Parser.h (Ch1-7) パーサ用ヘッダ
  - Passes.h (Ch4-7) Pass用ヘッダ
  - Ops.td　　(Ch2-7)　Opsのテーブル定義ファイル
  - ShapeInferenceInterface.h (Ch4-7) ShapeInferenceInterface用ヘッダ
  - ShapeInferenceInterface.td (Ch4-7) ShapeInferenceInterfaceのテーブル定義コード
parser/
- AST.cpp (Ch1-7) Toy言語AST変換
mlir/
- Dialect.cpp　(Ch2-7) Dialectコード
- LowerToLLVM.cpp (Ch6-7) LLVM出力用コード
- ShapeIOnferencePass.cpp (Ch4-7) ShapeInferencePass用コード
- ToyCombine.td (CH3-7) ToyCombineのテーブル定義ファイル
- LowerToAffineLoops.cpp　全結合層の最適化コード
- MLIRGen.cpp (Ch2-7) MLIR生成コード
- ToyCombine.cpp (Ch3-7) ToyCombine最適化コード

A. 参考資料

A.1. 公式資料

入門

Getting Started
Tutorials
Talks and Related Publications
- 2019 EuroLLVM Developers’ Meeting: Mehdi & Vasilache & Zinenko “Building a Compiler with MLIR” Toy言語をベースにしたチュートリアル
  - Introduction: a Toy Language 5頁から9頁目までが、コードの最適化等のPassごとでのやりたいことの概観
  - Dialect Lowering MLIRからLLVMIRへの変換(34頁から68頁まで)
  - A Dialect for Linear Algebra Optimizations
- 2019 EuroLLVM Developers’ Meeting: T. Shpeisman & C. Lattner “MLIR: Multi-Level Intermediate Repr..”
- MLIR: A Compiler Infrastructure for the End of Moore's Law

言語仕様など

MLIR Language Reference
- Dialect
MLIRで利用可能な機構
- Table-driven Operation Definition Specification (ODS)
- Table-driven Declarative Rewrite Rule (DRR)
Pass
- Pass Infrastructure
- Passes

A.2. その他

深層学習コンパイラの概要と最近の動向xSIG 2019 (今井健男)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up