5
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

RustAdvent Calendar 2023

Day 4

Rust で漢字をひらがな、カタカナに変換する

Posted at

動機

下記の記事の C# コードを Rust に写経してみたかったので書きました。

実装

Cargo.toml

windows クレートを使います。Cargo.toml で feature flag を立てて、COM と IME の API が使えるようにします。

Cargo.toml
[dependencies.windows]
version = "0.52"
features = [
    "Win32_System_Com",
    "Win32_UI_Input_Ime"
]

main.rs

COM

COM を使うときはCoInitialize関数を最初に呼ぶ必要があります。そして、プロセスを終了する前にCoUninitializeを呼ばなければなりません。なので、Com構造体を定義しておいて、それをインスタンス化するときにCoInitializeを呼んで、ComdropするときにCoUninitializeが呼ばれるようにしておきます。

use anyhow::Result;
use windows::Win32::System::Com::{CoInitialize, CoUninitialize};

struct Com;

impl Drop for Com {
    fn drop(&mut self) {
        unsafe { CoUninitialize() };
    }
}

impl Com {
    fn new() -> Result<Self> {
        unsafe { CoInitialize(None)? };
        Ok(Com)
    }
}

fn main() -> Result<()> {
    let _com = Com::new()?;
    Ok(())
    // スコープを抜けるときに自動的に CoUninitialize が呼ばれる
}

CreateInstance

続いて C# のIFELanguageをインスタンス化する処理のところ。

C#
Type type = Type.GetTypeFromProgID("MSIME.Japan");
ife = Activator.CreateInstance(type) as IFELanguage;

Rust では下記のようになります。

Rust
let clsid = unsafe { CLSIDFromProgID(w!("MSIME.Japan"))? };
let ife: IFELanguage = unsafe { CoCreateInstance(&clsid, None, CLSCTX_ALL)? };

IFELanguage

IFELanguageインスタンスも初期化と終了処理が必要なようです。

C#
public class FELanguage : IDisposable {
    private IFELanguage _ifelang;
    
    public FELanguage() {
        Type type = Type.GetTypeFromProgID("MSIME.Japan");
        this._ifelang = Activator.CreateInstance(type) as IFELanguage;
        int hr = this._ifelang.Open();
        if (hr != 0) {
            throw Marshal.GetExceptionForHR(hr) ?? throw new Exception($"{hr} is not error");
        }
    }

    public void Dispose() {
        this._ifelang?.Close();
    }
}

これを Rust に写経するとこうなります。

Rust
struct FElanguage(IFELanguage);

impl Drop for FElanguage {
    fn drop(&mut self) {
        unsafe { self.0.Close().ok() };
    }
}

impl FElanguage {
    fn new() -> Result<Self> {
        let clsid = unsafe { CLSIDFromProgID(w!("MSIME.Japan"))? };
        let ife: IFELanguage = unsafe { CoCreateInstance(&clsid, None, CLSCTX_ALL)? };
        unsafe { ife.Open()? };
        Ok(FElanguage(ife))
    }
}

あとはGetPhoneticGetConversionそしてGetJMorphResultなどのメソッドを呼ぶ関数を C# と同じように実装してやればよいわけです。

Rust
impl FElanguage {
    fn phonetic(&self, kanji: &str) -> Result<String> {
        let kanji = to_bstr(kanji)?;
        let mut hiragana = BSTR::new();
        unsafe { self.0.GetPhonetic(&kanji, 1, -1, &mut hiragana)? };
        Ok(hiragana.to_string())
    }

    fn conversion(&self, hiragana: &str) -> Result<String> {
        let hiragana = to_bstr(hiragana)?;
        let mut kanji = BSTR::new();
        unsafe { self.0.GetConversion(&hiragana, 1, -1, &mut kanji)? };
        Ok(kanji.to_string())
    }

    fn j_morph_result(&self, input: &str, request: u32, mode: u32) -> Result<String> {
        // input は null termination である必要がある
        let input = input.encode_utf16().chain(Some(0)).collect::<Vec<_>>();
        let len = input.len();
        let input = PCWSTR::from_raw(input.as_ptr());
        let mut result = ptr::null_mut();
        unsafe {
            self.0
                .GetJMorphResult(request, mode, len as _, input, ptr::null_mut(), &mut result)?;
        };
        let result = unsafe { *result };
        let output = result.pwchOutput;
        // output は null termination じゃないっぽいので
        // output count で文字列を区切る必要あり?
        let output = unsafe { output.to_string()? }
            .chars()
            .take(result.cchOutput as _)
            .collect();
        Ok(output)
    }
}

fn to_bstr(value: &str) -> Result<BSTR> {
    let value = value.encode_utf16().collect::<Vec<_>>();
    Ok(BSTR::from_wide(&value)?)
}    

そんな実装を実行した結果。ちゃんと動きました:relaxed:

> cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.28s
Running `target\debug\kanji.exe`
漢字 => かんじ
かんじ => 漢字
漢字 => カンジ
カンジ => 漢字
カンジ => かんじ

まとめ

コピペ用ソースコード
Cargo.toml
[package]
name = "kanji"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1.0"

[dependencies.windows]
version = "0.52"
features = [
    "Win32_System_Com",
    "Win32_UI_Input_Ime"
]

main.rs
use anyhow::Result;
use std::ptr;
use windows::{
    core::{w, BSTR, PCWSTR},
    Win32::{
        System::Com::{
            CLSIDFromProgID, CoCreateInstance, CoInitialize, CoUninitialize, CLSCTX_ALL,
        },
        UI::Input::Ime::{
            IFELanguage, FELANG_CMODE_KATAKANAOUT, FELANG_CMODE_PRECONV, FELANG_REQ_CONV,
            FELANG_REQ_REV,
        },
    },
};

struct Com;

impl Drop for Com {
    fn drop(&mut self) {
        unsafe { CoUninitialize() };
    }
}

impl Com {
    fn new() -> Result<Self> {
        unsafe { CoInitialize(None)? };
        Ok(Com)
    }
}

struct FElanguage(IFELanguage);

impl Drop for FElanguage {
    fn drop(&mut self) {
        unsafe { self.0.Close().ok() };
    }
}

impl FElanguage {
    fn new() -> Result<Self> {
        let clsid = unsafe { CLSIDFromProgID(w!("MSIME.Japan"))? };
        let ife: IFELanguage = unsafe { CoCreateInstance(&clsid, None, CLSCTX_ALL)? };
        unsafe { ife.Open()? };
        Ok(FElanguage(ife))
    }

    fn phonetic(&self, kanji: &str) -> Result<String> {
        let kanji = to_bstr(kanji)?;
        let mut hiragana = BSTR::new();
        unsafe { self.0.GetPhonetic(&kanji, 1, -1, &mut hiragana)? };
        Ok(hiragana.to_string())
    }

    fn conversion(&self, hiragana: &str) -> Result<String> {
        let hiragana = to_bstr(hiragana)?;
        let mut kanji = BSTR::new();
        unsafe { self.0.GetConversion(&hiragana, 1, -1, &mut kanji)? };
        Ok(kanji.to_string())
    }

    fn j_morph_result(&self, input: &str, request: u32, mode: u32) -> Result<String> {
        let input = input.encode_utf16().chain(Some(0)).collect::<Vec<_>>();
        let len = input.len();
        let input = PCWSTR::from_raw(input.as_ptr());
        let mut result = ptr::null_mut();
        unsafe {
            self.0
                .GetJMorphResult(request, mode, len as _, input, ptr::null_mut(), &mut result)?;
        };
        let result = unsafe { *result };
        let output = result.pwchOutput;
        let output = unsafe { output.to_string()? }
            .chars()
            .take(result.cchOutput as _)
            .collect();
        Ok(output)
    }

    fn katakana(&self, input: &str) -> Result<String> {
        self.j_morph_result(input, FELANG_REQ_REV, FELANG_CMODE_KATAKANAOUT)
    }

    fn kanji(&self, input: &str) -> Result<String> {
        self.j_morph_result(input, FELANG_REQ_CONV, FELANG_CMODE_PRECONV)
    }

    fn hiragana(&self, input: &str) -> Result<String> {
        self.j_morph_result(input, FELANG_REQ_REV, FELANG_CMODE_PRECONV)
    }
}

fn to_bstr(value: &str) -> Result<BSTR> {
    let value = value.encode_utf16().collect::<Vec<_>>();
    Ok(BSTR::from_wide(&value)?)
}

fn main() -> Result<()> {
    let _com = Com::new()?;
    let ife = FElanguage::new()?;

    let kanji = "漢字";

    let hiragana = ife.phonetic(kanji)?;
    println!("{kanji} => {hiragana}");

    let kanji = ife.conversion(&hiragana)?;
    println!("{hiragana} => {kanji}");

    let katakana = ife.katakana(&kanji)?;
    println!("{kanji} => {katakana}");

    let kanji = ife.kanji(&katakana)?;
    println!("{katakana} => {kanji}");

    let hiragana = ife.hiragana(&katakana)?;
    println!("{katakana} => {hiragana}");

    Ok(())
}
5
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?