Siv3DAdvent Calendar 2024

OpenSiv3Dでワードクラウドを描く

Posted at 2024-12-16

はじめに

この記事は、Siv3D Advent Calendar 2024 16日目の記事です。

OpenSiv3Dでワードクラウドを再現したので、その手順を書いてみます。

環境

Windows 11 Home 23H2
Visual Studio 2022
OpenSiv3D 0.6.15.6507

Mecabのインストール

元となる文章から名詞のみを取り出してワードクラウドを作るため、日本語形態素解析エンジンのMecabを利用します。今回はOpenSiv3Dと組み合わせて使う都合上、最新版のVisual Studioで利用するため、公式の実装ではなくメンテナンスされている非公式の実装を利用しました。

まずはlibファイルとdllファイルをダウンロードします。ビルド済みのReleaseがあるので、それを利用しました。執筆時点での最新版はv0.996.10でした。

次に辞書をダウンロードして、makeを実行します。makeの実行はWSL上で行いました。

$ git clone https://github.com/shogo82148/mecab.git
$ cd mecab-ipadic
$ ./autogen.sh
$ ./configure --with-charset=sjis
$ make all

筆者の環境では辞書の文字コードをutf-8にすると文字化けしたため、sjisとしています。

ワードクラウドのアルゴリズム

ChatGPTに質問してみます。すると以下のように答えを教えてくれました。

必要なステップ

単語の頻度解析
テキストをMeCabで形態素解析し、出現頻度を集計します。これにより、頻度の高い単語は大きく、頻度の低い単語は小さく表示されるようになります。

単語の初期配置とサイズ決定
頻度に応じて単語のフォントサイズを決定します。一般に、最も頻度の高い単語が中央に配置され、それ以外は順次外側に向かって配置されるようにします。

衝突回避アルゴリズム
単語が重ならないように、各単語の位置を調整するアルゴリズムを実装します。ここで、スパイラル配置や衝突検出のテクニックを活用できます。

アルゴリズムの詳細

スパイラル配置
中央から外側に向かって円形のスパイラルを描きながら単語を配置します。各単語の位置が他の単語と重ならないように、次のように進みます：

最も大きな単語を中央に配置。

次に配置する単語が衝突しないように、配置候補の位置から開始してスパイラル状に移動します。

衝突検出と回避
単語が他の単語と重ならないように、配置後に他の単語との衝突を検出する関数を実装します。衝突が発生した場合は、スパイラルを少し進めて、再配置を試みます。

また、ソースコードも提示してくれました。

#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>

// 単語の情報を保持する構造体
struct Word {
    std::string text;
    int frequency;
    int fontSize;
    int x, y; // 配置位置
};

// スパイラル配置を生成する関数
std::pair<int, int> getNextPosition(int radius, double angle) {
    int x = static_cast<int>(radius * cos(angle));
    int y = static_cast<int>(radius * sin(angle));
    return {x, y};
}

// 衝突検出関数（他の単語と重なるかチェック）
bool checkCollision(const Word &word, const std::vector<Word> &placedWords) {
    for (const auto &placedWord : placedWords) {
        // 2つの単語の距離が近すぎるか確認
        int dx = word.x - placedWord.x;
        int dy = word.y - placedWord.y;
        int distanceSquared = dx * dx + dy * dy;
        if (distanceSquared < (word.fontSize + placedWord.fontSize) * (word.fontSize + placedWord.fontSize)) {
            return true; // 衝突あり
        }
    }
    return false;
}

void placeWords(std::vector<Word> &words) {
    std::vector<Word> placedWords;
    int radius = 0;
    double angle = 0.0;

    for (auto &word : words) {
        bool placed = false;
        while (!placed) {
            auto [x, y] = getNextPosition(radius, angle);
            word.x = x;
            word.y = y;

            if (!checkCollision(word, placedWords)) {
                placedWords.push_back(word);
                placed = true;
            } else {
                // 次のスパイラル位置へ
                radius += 5;
                angle += 0.5;
            }
        }
    }
}

int main() {
    std::vector<Word> words = {
        {"example", 10, 20, 0, 0},
        {"test", 8, 18, 0, 0},
        // 他の単語
    };

    placeWords(words);

    for (const auto &word : words) {
        std::cout << "Word: " << word.text << " Position: (" << word.x << ", " << word.y << ")" << std::endl;
    }

    return 0;
}

単語の出現頻度の高いものから順に、中央から外側に向かって円形のスパイラルを描くように配置すればよいことが分かります。

ChatGPTの返答だけでは心もとないので、JavaScriptで実装されたd3-cloudとその解説記事も確認します。

The layout algorithm itself is incredibly simple. For each word, starting with the most “important”:

Attempt to place the word at some starting point: usually near the middle, or somewhere on a central horizontal line.

If the word intersects with any previously-placed words, move it one step along an increasing spiral. Repeat until no intersections are found.

d3-cloudでも同様のアルゴリズムで単語を配置していることが分かります。これで、ChatGPTの考えたアルゴリズム・ソースコードが概ね正しいと確認できました。
次はOpenSiv3Dで描画できるよう書き直してみます。

ワードクラウドの実装

実装は以下のMain.cppにあります。

Main.cpp

# define NO_S3D_USING
# include <Siv3D.hpp> // Siv3D v0.6.15
# include <mecab.h>

// 単語の情報を保持する構造体
struct Word
{
	s3d::String text;
	s3d::int32 fontSize;
	s3d::int32 x, y; // 配置位置
	s3d::Array<s3d::Polygon> polygonText;

	Word(s3d::String text = String(U""), s3d::int32 fontSize = 0, s3d::int32 x = 0, s3d::int32 y = 0, s3d::Array<s3d::Polygon> polygonText = {}) :
	text(text), fontSize(fontSize), x(x), y(y), polygonText(polygonText) {};
};

#define MECABCHECK(eval) if (! eval) { \
   s3d::Console << U"Exception:" << s3d::Unicode::Widen(mecab_strerror(mecab)); \
   mecab_destroy(mecab); \
   return {}; }

// 形態素解析して名詞のみを抽出
s3d::Array<s3d::String> extractNouns(s3d::String text)
{
	const char* dic_dir = "-r C:\\path\\to\\mecab-msvc-x64-0.996.10\\dic\\ipadic-utf8\\dicrc -d C:\\path\\to\\mecab-msvc-x64-0.996.10\\dic\\ipadic-utf8";
	Array<String> nouns = {};

	// MeCabのtaggerを初期化
	MeCab::Tagger* tagger = MeCab::createTagger(dic_dir);
	mecab_t* mecab = mecab_new2(dic_dir);
	MECABCHECK(mecab);

	// 解析
	std::string result = std::string(tagger->parse(text.narrow().c_str()));
	std::string str = text.narrow();
	const char* c_text = str.c_str();
	//const char* result = mecab_sparse_tostr(mecab, c_text);
	if (!result.c_str())
	{
		MECABCHECK(result.c_str());
		return nouns;
	}
	// 一行ずつ処理、名詞のみ抽出
	std::istringstream iss(result);
	std::string line;
	while (std::getline(iss, line))
	{
		if (line == "EOS")
		{
			break;
		}
		std::string surface, feature;
		size_t tab_pos = line.find('\t');
		if (tab_pos != std::string::npos)
		{
			surface = line.substr(0, tab_pos);
			feature = line.substr(tab_pos + 1);
			if (feature.starts_with("名詞"))
			{
				nouns.push_back(s3d::Unicode::Widen(surface));
			}
		}
	}

	delete tagger;
	return nouns;
}

// 単語の個数を計算する
s3d::Array<Word> makeWords(s3d::Array<s3d::String> nouns) {
	const double base = 5.0;              // 対数スケーリングのベース（調整可能）
	const s3d::int32 minimumFontSize = 16; // 最小フォントサイズ
	Array<Word> words;
	std::map<s3d::String, s3d::int32> mp;
	for (auto n : nouns)
	{
		mp[n]++;
	}
	Array <std::pair<int, s3d::String>> array;
	for (auto p : mp)
	{
		array.push_back({p.second, p.first});
	}
	array.rsort();
	for (auto a : array)
	{
		Word tmpWord = { a.second, (s3d::int32)(s3d::Log(a.first / 1.0) * minimumFontSize + minimumFontSize), 0, 0, {} };
		words.push_back(tmpWord);
	}
	return words;
}

// スパイラル配置を生成する関数
s3d::Vec2 getNextPosition(s3d::int32 radius, double angle)
{
	double x = radius * Cos(angle);
	double y = radius * Sin(angle);
	return { x, y };
	// return RandomVec2(20);
}

// 衝突検出関数（他の単語と重なるかチェック）
bool checkCollision(const Word& word, const s3d::Array<Word>& placedWords)
{
	for (const auto& placedWord : placedWords)
	{
		for (auto placedPolygon : placedWord.polygonText)
		{
			for (auto polygon : word.polygonText)
			{
				if (placedPolygon.intersects(polygon))
				{
					return true;
				}
			}
		}
	}
	return false;
}

// 文字列を描画したときの各文字の Polygon を返す関数
s3d::Array<s3d::Polygon> ToPolygons(const Vec2& basePos, const s3d::Array<PolygonGlyph>& glyphs)
{
	s3d::Array<s3d::Polygon> polygons;

	Vec2 penPos{ basePos };

	for (const auto& glyph : glyphs)
	{
		for (const auto& polygon : glyph.polygons)
		{
			polygons << polygon.movedBy(penPos + glyph.getOffset());
		}

		penPos.x += glyph.xAdvance;
	}

	return polygons;
}

// 単語の配置を決める
Word PlaceWord(double& angle, Word currentWord, Array<Word> placedWords)
{
	s3d::int32 radius = 0;
	bool placed = false;
	while (!placed)
	{
		auto [x, y] = getNextPosition(radius, angle);
		currentWord.x = x;
		currentWord.y = y;
		for (auto& polygon : currentWord.polygonText)
		{
			polygon.moveBy(Vec2(x, y));
		}

		if (!checkCollision(currentWord, placedWords))
		{
			return currentWord;
		}
		else
		{
			// 次のスパイラル位置へ
			radius += 10;
			angle += 1;
		}
		if (radius >= 1000)
		{
			return currentWord;
		}
	}
	return currentWord;
}

void Main()
{
	s3d::TextAreaEditState textAreaState
	{
		U"Siv3D（シブスリーディー）は、音や画像、AI を使ったゲームやアプリを、"
		U"モダンな C++ コードで楽しく簡単にプログラミングできるオープンソースのフレームワークです。"
		U"Siv3D のここが最高\n"
		U"非常に短いコード\n"
		U"1.2 Siv3D のここが最高\n"
		U"非常に短いコード\n"
		U"Siv3D のコードは最短で 2 行です。描画やインタラクションを実現するための便利な関数とクラスが揃っているため、アプリケーションのほとんどは 1 つの.cpp ファイルだけで完成します。あなたのアイデアを、GitHub Gist などのコード共有サイトを使って手軽に保存・シェアして、世界中の Siv3D ユーザと技術を交換し学び合いましょう。\n"
		U"最新の C++ を学べる\n"
		U"Siv3D のサンプルとライブラリ API は、最新の C++20 スタイルで書かれているため、Siv3D を使っているだけで、モダンな C++ の書き方やテクニックが身に付きます。Siv3D の作者は、日本最大のゲーム開発カンファレンス CEDEC で 最新 C++ の活用に関する講演 をしたり、C++ の情報ポータル を作成したりするなど、最先端の C++ の普及に努めています。\n"
		U"一貫した API\n"
		U"Siv3D は 2, 200 ファイルのソースコードと 90 のサードパーティ・ソフトウェアによって実現される、可視化やインタラクションのための様々な機能を、一貫した API で提供します。Siv3D のルールを覚えるだけで、ありとあらゆる機能が思いのままです。\n"
		U"オープンソース\n"
		U"Siv3D は MIT ライセンスのもと GitHub 上で開発 されているため、いつでも内部のコードを調べたり改造したりできます。サードパーティ・ライブラリを含め、商用利用を妨げる条件は無く、開発したゲームやアプリケーションの収益は 100 % 開発者が獲得できます。\n"
		U"軽量・迅速\n"
		U"Windows 版の OpenSiv3D SDK のインストーラのサイズは 120 MB 未満です。インストールはわずかなクリックで自動的に終わり、Visual Studio を起動すると、メニューには Siv3D プロジェクトを作成するアイテムが追加されていて、すぐにプログラミングを始められます。\n"
		U"親切なユーザコミュニティ\n"
		U"Siv3D を使って困ったことがあったら、Siv3D オンラインユーザコミュニティ で質問しましょう。匿名で質問したい場合は BBS も利用できます。毎月オンラインで開催される Siv3D 実装会では、Siv3D の熱心なユーザ達や Siv3D の作者と、Discord 上で雑談や技術的な相談をすることができます。Twitter では定期的に #Siv3D, #OpenSiv3D ハッシュタグを巡回しています。ユーザコミュニティが、作品の宣伝やシェアに協力してくれるでしょう。オープンソースソフトウェア開発に貢献したい学生には、Siv3D を練習場にしたサポートプログラムを毎年実施しています。\n"
		U"Web ブラウザで動く\n"
		U"現在試験的に提供される Web 版（OpenSiv3D for Web）を使うと、Siv3D で作った C++ アプリケーションを、Web ブラウザ上で実行可能なプログラムに変換できます。世界中の人がスマホやタブレットからあたなの作品を体験できます。"
	};

	// 初期配置の生成
	s3d::Array<String> nouns = extractNouns(textAreaState.text);
	s3d::Array<Word> undeterminedWords  = makeWords(nouns);
	for (auto& word : undeterminedWords)
	{
		const s3d::Font font{ word.fontSize, s3d::Typeface::Bold};
		word.polygonText = ToPolygons(s3d::Vec2{word.x, word.y}, font.renderPolygons(word.text));
	}
	double angle = 0.0;      // 角度
	Array<Word> placedWords; // 配置済みの単語

	// 2D カメラ
	// 初期設定: 中心 (0, 0), ズームアップ倍率 1.0
	Camera2D camera{ Vec2{ 0, 0 }, 1.0 };

	// 単語の配置を決める際、非同期タスクを利用する
	s3d::AsyncTask<Word> aTask;

	while (System::Update())
	{
		// 単語の配置を決める
		if (not undeterminedWords.isEmpty())
		{
			Word currentWord = undeterminedWords.front();
			undeterminedWords.pop_front();
			s3d::int32 radius = 0;
			if (!aTask.isValid())
			{
				aTask = Async(PlaceWord, std::ref(angle), currentWord, placedWords);
			}
			if (aTask.isReady())
			{
				placedWords.push_back(aTask.get());
			}
		}

		// 描画
		camera.update();
		{
			// 2D カメラの設定から Transformer2D を作成
			const auto t = camera.createTransformer();
			for (auto& word : placedWords)
			{
				for (auto& polygon : word.polygonText)
				{
					polygon.draw();
				}
			}
		}

		if (SimpleGUI::Button(U"Move to center", Vec2{ 30, 165 }))
		{
			// 中心とズームアップ倍率の目標値をセットして、時間をかけて変更する
			camera.setTargetCenter(Vec2{ 0, 0 });
			camera.setTargetScale(1.0);
		}

		// 2D カメラ操作の UI を表示
		camera.draw(Palette::Orange);

		// テキストを入力
		s3d::SimpleGUI::TextArea(textAreaState, Vec2{ 30, 20 }, SizeF{ 740, 100 });
		

		// 再生成ボタン
		if (s3d::SimpleGUI::Button(U"生成", Vec2{ 30, 125 }))
		{
			placedWords = {};
			nouns = extractNouns(textAreaState.text);

			undeterminedWords = makeWords(nouns);
			for (auto& word : undeterminedWords)
			{
				const s3d::Font font{ word.fontSize, s3d::Typeface::Bold };
				word.polygonText = ToPolygons(s3d::Vec2{ word.x, word.y }, font.renderPolygons(word.text));
			}
		}

	}
}

//
// - Debug ビルド: プログラムの最適化を減らす代わりに、エラーやクラッシュ時に詳細な情報を得られます。
//
// - Release ビルド: 最大限の最適化でビルドします。
//
// - [デバッグ] メニュー → [デバッグの開始] でプログラムを実行すると、[出力] ウィンドウに詳細なログが表示され、エラーの原因を探せます。
//
// - Visual Studio を更新した直後は、プログラムのリビルド（[ビルド]メニュー → [ソリューションのリビルド]）が必要な場合があります。
//

ビルド時の注意点

以下のディレクトリにダウンロードしたlibファイルとdllファイルへのパスを追加する
- プロジェクトのプロパティ > C/C++ > 全般 > 追加のインクルードディレクトリ
プロジェクトのプロパティ > リンカー > 入力 > 追加の依存ファイルにmecab.libとlibmecab.libへのパスを追加する。
Appディレクトリ(exeファイルが生成されるディレクトリ)に、libmecab.dllをコピーして配置する。

ポイント

非同期タスクの利用

単語の配置は時間のかかる処理であり、メインループ内で実施すると処理がそこでストップするため、フレームレートが低下します。メインループと非同期で単語の配置処理を進めるため、非同期処理を利用しました。

2Dカメラの採用

今回実装したワードクラウドには、単語の配置を特定の範囲内に収めるアルゴリズムはありません。単語が外側に配置された場合でも表示できるよう、2Dカメラを採用しました。

結果

生成されたワードクラウドは以下のようになりました。2Dカメラを採用しているため、上下左右の移動、拡大縮小も可能です。

残された課題

単語を特定の領域内に配置する
単語の大きさの最適化
色を付ける

参考資料

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up