More than 1 year has passed since last update.

LLVMインラインアセンブラのmemory constraints/clobberのはなし

Posted at 2023-12-31

どうでもいい前置き: gistにおいてたんけど、あとで検索性わるいよなということに気がついたので一時的かもしれないがQiitaにも書く。

memory constraints/memory clobberのはなし

memory clobber

これの "memory" の部分
- このコードはcompiler fenceとかよばれてる
- コンパイラがコードを並べ替えしないように
- CPUの命令reorderingは防げないので注意

__asm__ __volatile__("": : :"memory");

LLVMはmemory clobberを解釈しない

LLVMインラインアセンブラはC言語とは文法が違う
ドキュメント的には~{memory}が相当するっぽく書かれてる
- が、clobber string of “~{memory}” indicates that the assembly writes to arbitrary undeclared memory locationsはウソ
- clobberで破壊レジスタしかチェックしてない

case InlineAsm::isClobber: {

  const unsigned NumRegs = OpInfo.Regs.size();
  if (NumRegs > 0) {
    unsigned Flag = InlineAsm::Flag(InlineAsm::Kind::Clobber, NumRegs);
    Inst.addImm(Flag);

    for (Register Reg : OpInfo.Regs) {
      Inst.addReg(Reg, RegState::Define | RegState::EarlyClobber |
                           getImplRegState(Reg.isPhysical()));
    }
  }
  break;
}

LLVMでは代わりにmemory constraintでr/w memoryをチェックする

https://llvm.org/docs/LangRef.html#inline-asm-constraint-string
- Memory constraint. This kind of constraint is for use with an instruction taking a memory operand. Different constraints allow for different addressing modes used by the target.
=*m(memory output)やm(memory input)みたいな感じで

// store val into dst
void store(ref int dst, int val) {
  __asm("movl $1, $0", "=*m,r", &dst, val);
}

arch-specific registerはどうするの？

アーキテクチャごとに頑張る
例: AVR: operandにmemoryフラグを建てるコード

case 'Q': // A memory address based on Y or Z pointer with displacement.
  return C_Memory;

そもそも我々はそんなオペランド知らんが？

clangやRustは自前で頑張ってる

なんで？

LLVMはmemory operandsやmemory clobberの有無をみて最適化してくれないから

どゆこと

まずLLVMのinline assemblyは関数(call/invoke)と同じ扱い
ここの最適化可能性はMemory Effectsという要素が絡む
関数はmemoryへのinput/outputがなければいろいろ最適化可能だが、現状inline assemblyはやってくれていない

おまけ: memory operandはなにしとるの？

GlobalISelのInlineAsmのloweringではフラグの種別にMemをセット
- Codegenの情報に使う
ただし最適化用途には使っていない

switch (OpInfo.Type) {
case InlineAsm::isOutput:
  if (OpInfo.ConstraintType == TargetLowering::C_Memory) {
    const InlineAsm::ConstraintCode ConstraintID =
        TLI->getInlineAsmMemConstraint(OpInfo.ConstraintCode);
    assert(ConstraintID != InlineAsm::ConstraintCode::Unknown &&
           "Failed to convert memory constraint code to constraint id.");

    // Add information to the INLINEASM instruction to know about this
    // output.
    InlineAsm::Flag Flag(InlineAsm::Kind::Mem, 1);
    Flag.setMemConstraint(ConstraintID);
    Inst.addImm(Flag);

各言語みていく

clang(C言語)
Rust
LDC(D言語)

clang

constraintsのパースもmemory clobberのチェックもLLVMに頼らず自前
まずReadNone=ReadOnly=trueと仮定して、constraintsの内容をみて最終的に決まっていく
memory clobberがあればReadNone=ReadOnly=false

if (Clobber == "memory")
  ReadOnly = ReadNone = false;

アーキテクチャごとにconstraint stringチェックしてmemory constraintsがああればReadNoneではない)
- ReadOnlyの可能性はまだある

if (Info.allowsMemory())
  ReadNone = false;

ここのallowsMemoryをclangはLLVMに頼らずに自前で頑張っている
- 例: RISC-V

case 'A':
  // An address that is held in a general-purpose register.
  Info.setAllowsMemory();

その他、参照で値を返す場合もReadOnly=ReadNoneはfalse
- LLVMのインラインアセンブラの形式に合わせてconstraints stringを構築

    // If this is a register output, then make the inline asm return it
    // by-value.  If this is a memory result, return the value by-reference.
    QualType QTy = OutExpr->getType();
    const bool IsScalarOrAggregate = hasScalarEvaluationKind(QTy) ||
                                     hasAggregateEvaluationKind(QTy);
    if (!Info.allowsMemory() && IsScalarOrAggregate) {
(...)
    } else {
(...)
      Constraints += "=*";
      Constraints += OutputConstraint;
      ReadOnly = ReadNone = false;
    }

これらの情報を元にCallInstを構築
- MemoryEffectsの設定はUpdateAsmCallInst関数内

  } else if (HasUnwindClobber) {
    llvm::CallBase *Result = EmitCallOrInvoke(IA, Args, "");
    UpdateAsmCallInst(*Result, HasSideEffect, true, ReadOnly, ReadNone,
                      InNoMergeAttributedStmt, S, ResultRegTypes, ArgElemTypes,
                      *this, RegResults);
  } else {
    llvm::CallInst *Result =
        Builder.CreateCall(IA, Args, getBundlesForFunclet(IA));
    UpdateAsmCallInst(*Result, HasSideEffect, false, ReadOnly, ReadNone,
                      InNoMergeAttributedStmt, S, ResultRegTypes, ArgElemTypes,
                      *this, RegResults);
  }

ReadNone/ReadOnlyであるかをみて、関数にMemoryEffectsを設定し最適化
- volatile(SideEffect)がついているとメモリの読み書きがあると仮定するようになっている
- やや保守的

// Attach readnone and readonly attributes.
if (!HasSideEffect) {
  if (ReadNone)
    Result.setDoesNotAccessMemory();
  else if (ReadOnly)
    Result.setOnlyReadsMemory();
}

Rust

独自のasm!構文
オプションにより振る舞いを指定
- pure/nomem/readonlyなど
オプション
- nomem: メモリへの読み書きが起こらない(memory clobberの逆)
- readonly: メモリへの書き込みがない

NOMEMがセットされてない(デフォルトの振る舞い)

~{memory}をLLVMに渡しているが、LLVMにはignoreされる旨が記述されている

if !options.contains(InlineAsmOptions::NOMEM) {
    // This is actually ignored by LLVM, but it's probably best to keep
    // it just in case. LLVM instead uses the ReadOnly/ReadNone
    // attributes on the call instruction to optimize.
    constraints.push("~{memory}".to_string());
}

NOMEMが付与された場合、条件によりMemoryEffectsを関数属性に渡す
- オプションにより最適化可能性が変わる
- https://github.com/rust-lang/rust/blob/558ac1cfb7c214d06ca471885a57caa6c8301bae/compiler/rustc_codegen_llvm/src/asm.rs#L304-L317

let mut attrs = SmallVec::<[_; 2]>::new();
if options.contains(InlineAsmOptions::PURE) {
    if options.contains(InlineAsmOptions::NOMEM) {
        attrs.push(llvm::MemoryEffects::None.create_attr(self.cx.llcx));
    } else if options.contains(InlineAsmOptions::READONLY) {
        attrs.push(llvm::MemoryEffects::ReadOnly.create_attr(self.cx.llcx));
    }
    attrs.push(llvm::AttributeKind::WillReturn.create_attr(self.cx.llcx));
} else if options.contains(InlineAsmOptions::NOMEM) {
    attrs.push(llvm::MemoryEffects::InaccessibleMemOnly.create_attr(self.cx.llcx));
} else {
    // LLVM doesn't have an attribute to represent ReadOnly + SideEffect
}
attributes::apply_to_callsite(result, llvm::AttributePlace::Function, &{ attrs });

オプション組み合わせによる最適化可能性

PURE+NOMEM
- 副作用がない+メモリへの読み書きがない
- => None(clangでいうReadNone)
PURE+READONLY
- 副作用がない+メモリへの書き込みがない
- => ReadOnly
NOMEM
- メモリへの読み書きがない
- => InAccessibleMemOnly
  - アクセスできる範囲でのメモリの読み書きがない
  - 副作用はあるかも

おまけ: willreturn

この関数属性はループや再帰を持たない(という最適化を許す)
clangが-O3で無限ループを消す最適化(UB)をするのはこいつがあるから
Rustも昔このUBがあったが、readonlyがwillreturnをinferするのをやめたので今はUBではなくなった
- いまだに誤解されてることがよくあるね

LDC

ほぼLLVMインラインアセンブラに素通し
- 一番素のLLVMに近い
- memory operandとか構文がそのまま使える
常にsideeffect=true(volatile相当)が付与
オプションをいじる自由度がない
関数属性はいじらない = 最適化はしない

// build asm call
bool sideeffect = true;
llvm::InlineAsm *ia = llvm::InlineAsm::get(FT, code, constraints, sideeffect);

まとめ

LLVMを信用するな

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up