NTTコムウェアAdvent Calendar 2024

NTTコムウェア株式会社

Vertex AI in Firebaseを使って撮影した画像についてGeminiに質問してみた

Last updated at 2024-12-20Posted at 2024-12-20

この記事はNTTコムウェア Advent Calendar 2024 21日目の記事です。

はじめに

NTTコムウェアのもとじまです。モバイル開発（Android/iOS）が強みです。

本記事では、サンプルカメラアプリを題材に「Vertex AI in Firebase」の機能を紹介していこうと思います。

以下、本記事で使用するサンプルカメラアプリのキャプチャ（Android Studioのエミュレータを使用して撮影）です。

出典：Google「Android Studio」Pixel8エミュレータカメラ起動時画面（2024/12/13）

環境およびサンプルカメラアプリについて

筆者の環境とサンプルカメラアプリの情報です。

環境

OS：Mac OS
チップ: Apple M2 Pro
IDE：Android Studio Ladybug 2024.2.1 Patch 2

サンプルカメラアプリ

言語：Kotlin
- バージョン：2.0.21
UIフレームワーク：Jetpack Compose
Gradleバージョン：8.9
使用ライブラリ：
- CameraX
  - androidx.camera:camera-core:1.4.1
  - androidx.camera:camera2:1.4.1
  - androidx.camera:camera-lifecycle:1.4.1
  - androidx.camera:camera-view:1.4.1
- Accompanist
  - com.google.accompanist:accompanist-permissions:0.34.0
- Koin
  - io.insert-koin:koin-core:3.4.0
  - io.insert-koin:koin-android:3.4.0
  - io.insert-koin:koin-androidx-compose::3.4.0

本記事の構成

1.「Vertex AI in Firebase」とは
2. Firebase Vertex AI SDKの中身を見てみる
3. サンプルカメラアプリについて
4. 画像に関してGeminiに質問
5. ソースコード一覧
6. まとめ

「Vertex AI in Firebase」とは

以下、公式による「Vertex AI in Firebase」の説明です。

Vertex AI Gemini APIは、Googleの最新の生成AIモデルであるGeminiモデルへのアクセスを提供します。Vertex AI Gemini API をサーバーサイドではなく、モバイルやウェブアプリから直接呼び出す必要がある場合、Vertex AI in Firebase SDK を使用することができます。これらのクライアントSDKは、モバイルアプリやウェブアプリでの使用に特化して構築されており、不正なクライアントに対するセキュリティオプションや他のFirebaseサービスとの統合を提供します。

これらのクライアントSDKを使用すると、アプリにAIパーソナライゼーションを追加したり、AIチャット体験を構築したり、AIを利用した最適化や自動化を作成したりすることができます。

出典：Firebase「Gemini API（Vertex AI in Firebase を使用）」https://firebase.google.com/docs/vertex-ai（2024/12/13）

以前まで、Vertex AIにはサーバーサイドでアクセスし、サーバーサイドからの結果をモバイルアプリで使用してきました。

しかしVertex AI in Firebaseでは、Firebaseがモバイル用のSDKとしてVertex AI in Firebase SDKを提供することにより、モバイルアプリから直接Gemini APIを呼び出すことが可能になりました。

サーバーサイドの実装なしにモバイルアプリからGemini APIを呼び出すことができるようになった結果、高速なAIモバイルアプリ開発が可能となりました。

なお、Vertex AI in Firebaseの動向は以下の通りであり、比較的最新の技術であると言えるでしょう。

2024年5月にプレビュー版がアナウンス
2024年10月に一般公開

Vertex AI in Firebase SDKをモバイルアプリに組み込む

Vertex AI in Firebase SDKをモバイルアプリに組み込むのは非常に簡単です。

1.Firebaseにアプリを登録
2.Firebase設定ファイル（google-services.json）を指定のディレクトリに配置
3.build.gradle.ktsのpluginsに以下を追加

build.gradle.kts

// google-services.jsonを読み込み、Firebaseプロジェクトをアプリに統合する
id("com.google.gms.google-services) version "4.4.2" apply false

4.build.gradle.ktsのdependenciesに以下を追加

build.gradle.kts

// Firebaseライブラリのバージョン一元管理を行うために追加
implementation(platform("com.google.firebase:firebase-bom:33.7.0"))
    
// Firebase Analyticsライブラリをプロジェクトに追加
implementation("com.google.firebase:firebase-vertexai")

// Firebase Vertex AIライブラリをプロジェクトに追加
implementation("com.google.firebase:firebase-vertexai")

Vertex AI in Firebase SDKをビルドファイルに記載し、アプリのビルドが成功すれば、モバイルアプリからGemini APIを使用する準備は完了です。

なお、正式な導入手順および他プラットフォームへの導入（iOS、Web、Flutter）に関しては、以下公式ドキュメントを参照ください。

Vertex AI in Firebaseを使用するにはプロジェクトを従量課金制のBlazeプランへアップグレードする必要があります。

Firebase Vertex AI SDKの中身を見てみる

簡単にモバイルアプリにFirebase Vertex AI SDKを組み込むことはできましたが、実際にモバイルアプリからGemini APIを呼び出す時、SDKの中では何が起こっているのでしょうか。

Firebase Vertex AI SDKの中身を覗いてみることにしましょう。

今回はGenerativeModelのgenerateContentStreamメソッド呼び出し時の処理をシーケンスにして、Firebase Vertex AI SDKの内部処理がどうなっているのか追ってみることにします。

GenerativeModel.kt

public fun generateContentStream(vararg prompt: Content): Flow<GenerateContentResponse>

以下のシーケンス図はFirebase Vertex AI SDKの中身を参考に作成した処理シーケンス図です。

アプリ側でGenerativeModelのgenerateContentStreamを呼び出すとき、以下のオブジェクトが使用されていることがわかりました。

GenerativeModel
Content
APIController
HttpClient

以下、上記のオブジェクトに関して詳細に見ていくことにします。

上記のシーケンス図はgenerateContentStreamの動きを把握するにあたり根本となる処理をまとめたものです。
処理の詳細内容に関しては、別途公式ドキュメントやSDKをご確認下さい。
https://developer.android.google.cn/ai/vertex-ai-firebase

GenerativeModel

Gemini APIを提供するオブジェクトです。
GenerativeModelが提供する主な機能は以下です。

テキストや画像データからAI生成コンテンツを生成
テキストデータを処理する際に使用される基本単位であるトークンのカウント

GenerativeModelの生成の際に設定する値は以下です。

FirebaseVertexAI.kt

@JvmOverloads
  public fun generativeModel(
    modelName: String,
    generationConfig: GenerationConfig? = null,
    safetySettings: List<SafetySetting>? = null,
    tools: List<Tool>? = null,
    toolConfig: ToolConfig? = null,
    systemInstruction: Content? = null,
    requestOptions: RequestOptions = RequestOptions(),
  ): GenerativeModel

引数名	内容
modelName	使用する生成AIモデル名 gemini-1.5-flashやgemini-1.5-pro、gemini-1.0-proなどを設定する
generationConfig	AIモデルがコンテンツを生成する動きをカスタマイズするための設定値 GenerationConfigには以下の値を含む・tempature（トークン選択のランダム性を制御する重要なパラメータ）・maxOutputTokens（最大トークン数）を設定する
safetySettings	AIモデルに順守させる有害設定値のリスト
tools	AIモデルがアクセスできる関数宣言のセットリスト
toolConfig	関数宣言に対する設定値
systemInstruction	AIモデルに送信される入力・AIモデルから生成された出力を表すデータ
requestOptions	バックエンドへのリクエストの際に設定する追加オプション

必須の引数はmodelNameだけであり、その他の値はすべてオプショナルとなります。

実装例：

Example.kt

val generativeModel = Firebase.vertexAI.generativeModel(
    modelName = "gemini-1.5-flash", // 生成AIモデルとしてgemini-1.5-flashを使用
    generationConfig = generationConfig {
        // temperatureを0.7fに設定
        temperature = 0.7f
    }
)

Content

アプリと生成AIモデルとの間で送受信されるコンテンツのことです。

アプリから生成AIモデルにはコンテンツとしてテキストや画像データを渡し、生成AIモデルは受け取ったコンテンツを元にコンテンツを生成しアプリに返却します。

以下が、Contentのコンストラクタであり、role（コンテンツの生成元）のデフォルトがuserとなっています。

Content.kt

public class Content
@JvmOverloads
constructor(public val role: String? = "user", public val parts: List<Part>)

実装例：

Example.kt

fun createContent(input: String, photoBitmap: Bitmap): Content {
    // Contentにテキストや画像データを渡す
    return content {
        text(inputText)
        image(photoBitmap)
    }
}

APIController

Firebaseへの通信を管理するためのバックエンドクラスです。

APIControllerは、FirebaseへのHTTPリクエストとレスポンスのストリーミング処理を管理します。

APIController.kt

internal class APIController
internal constructor(
  private val key: String,
  model: String,
  private val requestOptions: RequestOptions,
  httpEngine: HttpClientEngine,
  private val apiClient: String,
  private val headerProvider: HeaderProvider?,
)

引数名	内容
key	Firebaseへのアクセス認証に使用されるAPIキー Firebase設定ファイル（google-services.json）のapi_keyの値が設定される
model	使用する生成AIモデル名 generativeModelで設定したmodelNameが渡される
requestOptions	バックエンドへのリクエストの際に設定する追加オプションgenerativeModelで設定したrequestOptionsが渡される
HttpClientEngine	リクエストの際に使用されるHTTPクライアントエンジンデフォルトのエンジンはOkHttp（io.ktor.client.engine.okhttp.OkHttp）
apiClient	x-goog-api-client ヘッダに渡す値
headerProvider	すべてのHTTPリクエストに追加するヘッダを生成するプロバイダ

APIControllerはFirebase AI in Firebase SDKの内部でのみ使用されます。
そのため、モバイルアプリから直接操作することはありません。

APIControllerはクラスの内部にio.ktor.clientパッケージのHttpClientオブジェクトを持ち、HttpClientのpostメソッドおよびpostStreamメソッドを使用しHTTPリクエストを行っています。

HttpClient

上述の通り、io.ktor.clientパッケージのHttpClientオブジェクトのことです。

HttpClientはFirebase（エンドポイントのドメインはfirebasevertexai.googleapis.com）にリクエストを送ります。

リクエストとしてGenerateContentRequestオブジェクトを使用します。また、レスポンスにはGenerateContentResponseオブジェクトを使用します。

Request.kt

@Serializable
internal data class GenerateContentRequest(
  val model: String? = null,
  val contents: List<Content>,
  @SerialName("safety_settings") val safetySettings: List<SafetySetting>? = null,
  @SerialName("generation_config") val generationConfig: GenerationConfig? = null,
  val tools: List<Tool>? = null,
  @SerialName("tool_config") var toolConfig: ToolConfig? = null,
  @SerialName("system_instruction") val systemInstruction: Content? = null,
) : Request

GenerateContentResponse.kt

public class GenerateContentResponse(
  public val candidates: List<Candidate>,
  public val promptFeedback: PromptFeedback?,
  public val usageMetadata: UsageMetadata?,
) {
  public val text: String? by lazy {
    candidates.first().content.parts.filterIsInstance<TextPart>().joinToString(" ") { it.text }
  }
  public val functionCalls: List<FunctionCallPart> by lazy {
    candidates.first().content.parts.filterIsInstance<FunctionCallPart>()
  }
}

シーケンス図に立ち戻って

GenerativeModel、Content、APIController、そしてHttpClientについて一体何を行うオブジェクトなのかを見てきました。

ここで上述したシーケンス図を再度見てみると、Vertex AI in Firebase SDKはGemini APIのみモバイルアプリに提供し、SDKの内部でVertex AI in FirebaseへのAPI呼び出しおよび関連する処理を行っていることがわかります。

Gemini APIを使うことにより、モバイルアプリの実装では「データレイヤ」が不要となり、コードを大きく減らすことができ、かつ迅速なAIサービスを導入できるというわけです。

データレイヤとはネットワークリクエスト、データベース操作、および関連するロジックを格納するレイヤを意味します。詳細は以下の公式サイトを参照ください。
https://developer.android.com/topic/architecture/data-layer

サンプルカメラアプリについて

Vertex AI in Firebase SDKの検証のために作成したサンプルカメラアプリのディレクトリ構成および画面実装に関して説明します

サンプルカメラアプリのディレクトリ構成

└vertexaisample
   ├─app
   │  └─VertexAISampleApplication.kt
   ├─di
   │  └─AppModule.kt
   └─ui
      └─MainActivity.kt
      └─MainScreen.kt
      └─MainViewModel.kt
      └─CameraPreviewContent.kt
      └─CapturedImageContent.kt
      └─util
         └─Bitmap.kt
         └─PermissionRequester.kt

前述した通りVertex AI in Firebase SDKを使用する場合、サーバーへのAPI呼び出しのコードを実装する必要はありません。

したがって、通常サーバーへのAPI呼び出しのために実装されるコードおよびライブラリ（OKHttpやRetrofit、GsonやMoshiなど）の導入は不要となります。

カメラプレビュー画面（CameraPreviewContent.kt）

本画面は、JetpackライブラリのCameraXを使用して実装しています。

カメラプレビュー画面では、ユーザーがカメラの使用を許可している場合はカメラプレビュー画面を表示し、ユーザーがまだカメラの使用を許可していない場合はカメラ使用許諾画面を表示します。

カメラプレビューの表示にはAndroidView使用しています。AndroidViewを使用すると、Compose内でネイティブAndroidのViewを使用することができます。

撮影ボタンをタップするとLifecycleCameraControllerのtakePictureメソッドが呼び出されます。

LifecycleCameraControllerのtakePictureメソッドではコールバックメソッドのonCaptureSuccessでImageProxyからImageを取得し、ImageをBitmapに変換し、撮影結果をコールバックonPhotoCaptured: (Bitmap) -> Unitで返します。

撮影結果はViewModel（MainViewModel）のフィールド変数（型はMutableStateFlow）に保存されます。

上記CameraXの詳細に関しては以下参照ください。

カメラプレビュー画面

出典：Google「Android Studio」Pixel8エミュレータカメラ起動時画面（2024/12/13）

CameraPreviewContent.kt

@Composable
fun CameraPreviewContent(
    onImageCaptured: (Bitmap) -> Unit,
) {
    val localContext = LocalContext.current
    val lifecycleCameraController = remember {
        LifecycleCameraController(localContext)
    }
    val lifecycleOwner = LocalLifecycleOwner.current

    // カメラのライフサイクル管理（本画面が表示される間だけカメラを起動）
    DisposableEffect(lifecycleOwner) {
        lifecycleCameraController.bindToLifecycle(lifecycleOwner)
        onDispose {
            lifecycleCameraController.unbind()
        }
    }

    Box(modifier = Modifier.fillMaxSize()) {
        // パーミッションが許可されている場合のみカメラビューを表示
        appPermissionUtil {
            Box(
                contentAlignment = Alignment.BottomCenter,
                modifier = Modifier.fillMaxSize()
            ) {
                // カメラプレビューを表示
                CameraPreview(
                    lifecycleCameraController = lifecycleCameraController,
                    modifier = Modifier.fillMaxSize()
                )
                // 撮影ボタンを表示
                CaptureButton(
                    lifecycleCameraController = lifecycleCameraController,
                    onPhotoCaptured = { bitmap ->
                        onImageCaptured(bitmap)
                    }
                )
            }
        }
    }
}

// カメラプレビューを表示するコンポーネント
@Composable
private fun CameraPreview(
    lifecycleCameraController: LifecycleCameraController,
    modifier: Modifier = Modifier,
) {
    // Android ViewをJetpack Composeに埋め込む
    AndroidView(
        modifier = modifier,
        factory = { context ->
            PreviewView(context).apply {
                implementationMode = PreviewView.ImplementationMode.COMPATIBLE
            }
        },
        update = { previewView ->
            previewView.controller = lifecycleCameraController
        }
    )
}

// 撮影ボタンを表示するコンポーネント
@Composable
private fun CaptureButton(
    lifecycleCameraController: LifecycleCameraController,
    onPhotoCaptured: (Bitmap) -> Unit,
) {
    val context = LocalContext.current
    Button(
        onClick = {
            capturePhoto(
                context = context,
                lifecycleCameraController = lifecycleCameraController,
                onPhotoCaptured = onPhotoCaptured,
            )
        },
        modifier = Modifier
            .padding(32.dp)
            .size(80.dp),
        shape = CircleShape,
    ) {
        Icon(
            painter = painterResource(id = R.drawable.photo_camera),
            contentDescription = "撮影",
        )
    }
}

// 撮影ボタンが押されたときの処理
private fun capturePhoto(
    lifecycleCameraController: LifecycleCameraController,
    context: Context,
    onPhotoCaptured: (Bitmap) -> Unit,
) {
    lifecycleCameraController.takePicture(
        ContextCompat.getMainExecutor(context),
        object : ImageCapture.OnImageCapturedCallback() {
            @ExperimentalGetImage
            override fun onCaptureSuccess(imageProxy: ImageProxy) {
                runCatching {
                    // 撮影成功時の処理
                    imageProxy.image?.apply {
                        // ImageをBitmapに変換
                        val bitmap =
                            toBitmap().rotateAndMirror(imageProxy.imageInfo.rotationDegrees)
                        onPhotoCaptured(bitmap) // キャプチャ結果をコールバックで返す
                    }
                }.onFailure { e ->
                    // 必要に応じてエラー処理を記述
                }.also {
                    imageProxy.close() // リソースを解放
                }
            }

            override fun onError(exception: ImageCaptureException) {
                // 必要に応じてエラー処理を記述
            }
        }
    )
}

イメージ表示画面（CapturedImageContent.kt）

本画面では、カメラプレビュー画面で撮影した画像を表示し、画面上にGemini APIを呼び出すCard型コンポーネントを配置しています。

ユーザーがCard型コンポーネントの入力欄に画像に関する質問を入力すると、Gemini APIが呼び出されます。

Geminiはユーザーの質問に対する答えに回答し、アプリに結果を返却します。

画像を変更したい場合、戻るボタンをタップすることで、カメラプレビュー画面に戻ることができます。

イメージ表示画面

出典：Google「Android Studio」Pixel8エミュレータカメラ起動時画面（2024/12/13）

CapturedImageContent.kt

@Composable
fun CapturedImageContent(
    photoBitmap: Bitmap,
    generatedAnswer: String,
    onReasoningButtonTapped: (String) -> Unit,
    onBackButtonTapped: () -> Unit,
) {
    // Boxレイアウトで画面全体を覆う
    Box(Modifier.fillMaxSize()) {
        // 撮影した画像を表示
        Image(
            bitmap = photoBitmap.asImageBitmap(),
            contentDescription = "画像",
            contentScale = ContentScale.FillHeight,
            modifier = Modifier.fillMaxSize(),
        )
        Column(
            Modifier
                .fillMaxSize()
                .padding(16.dp),
            verticalArrangement = Arrangement.Bottom,
            horizontalAlignment = Alignment.Start,
        ) {
            // 戻るボタンを表示
            Button(
                onClick = onBackButtonTapped,
                modifier = Modifier
                    .clip(CircleShape)
                    .padding(bottom = 8.dp)
            ) {
                Icon(
                    Icons.Default.Close,
                    contentDescription = "戻る"
                )
            }
            // 質問入力エリアと結果表示カード
            TextInputCard(
                generatedAnswer = generatedAnswer,
                onImageReasoning = onReasoningButtonTapped
            )
        }
    }
}

@OptIn(ExperimentalMaterial3Api::class)
@Composable
private fun TextInputCard(
    generatedAnswer: String,
    onImageReasoning: (String) -> Unit,
) {
    var inputText by remember { mutableStateOf("") }
    val keyboardController = LocalSoftwareKeyboardController.current
    Card(
        shape = RoundedCornerShape(16.dp),
        elevation = CardDefaults.cardElevation(4.dp),
        modifier = Modifier
            .padding(bottom = 48.dp)
    ) {
        Column(Modifier.padding(16.dp)) {
            Row(
                Modifier.fillMaxWidth(),
                horizontalArrangement = Arrangement.spacedBy(8.dp),
                verticalAlignment = Alignment.CenterVertically
            ) {
                // ユーザーの入力を受け取るテキストフィールド
                TextField(
                    value = inputText,
                    onValueChange = { inputText = it },
                    placeholder = { Text("質問を入力してください") },
                    modifier = Modifier.weight(1f)
                )
                // Gemini API呼び出しボタン（キーボードを閉じ、画像と入力した質問を送信する）
                Button(
                    onClick = {
                        keyboardController?.hide()
                        onImageReasoning(inputText)
                    }
                ) {
                    Icon(
                        Icons.AutoMirrored.Default.ArrowForward,
                        contentDescription = "送信",
                        modifier = Modifier.size(18.dp)
                    )
                }
            }
            // VertexAIからの応答を表示
            if (generatedAnswer.isNotEmpty()) {
                Text(
                    text = generatedAnswer,
                    style = MaterialTheme.typography.bodyLarge,
                    modifier = Modifier.padding(8.dp)
                )
            }
        }
    }
}

ViewModel（MainViewModel.kt）

本ViewModelはカメラプレビュー画面とイメージ表示画面の切り替え、ユーザーが撮影した画像のキャッシング、そしてコンテントストリームの作成を行います。

Vertex AI in Firebase SDKのgenerateContentStreamメソッドを使用してコンテントストリームを作成します。

generateContentStreamメソッドはFlow型のレスポンスを返却するため、呼び出しの際にViewModelScope（ViewModelに関連付けられたCoroutineScope）を使用しています。

MainViewModel.kt

class MainViewModel(
    private val generativeModel: GenerativeModel,
) : ViewModel() {

    private var _isImageReasoningReady = MutableStateFlow(false)
    var isImageReasoningReady: StateFlow<Boolean> = _isImageReasoningReady

    private val _photoBitmap: MutableStateFlow<Bitmap?> = MutableStateFlow(null)
    val photoBitmap: StateFlow<Bitmap?> = _photoBitmap

    private val _generatedAnswer = MutableStateFlow<String>("")
    val generatedAnswer: StateFlow<String> = _generatedAnswer

    // 撮影した画像をBitmap形式で保存
    fun storePhoto(bitmap: Bitmap) {
        _photoBitmap.value = bitmap
        _isImageReasoningReady.value = true
    }

    // 撮影した画像を削除
    fun deletePhoto() {
        _photoBitmap.value = null
        _isImageReasoningReady.value = false
        _generatedAnswer.value = ""
    }

    // 画像とテキストを使用してGemini APIを呼び出す
    fun reasonPhoto(inputText: String) {
        viewModelScope.launch(Dispatchers.IO) {
            if (photoBitmap.value == null) {
                return@launch
            }
            try {
                val inputContent = content {
                    image(photoBitmap.value ?: return@content)
                    text(inputText)
                }
                // 結果を格納
                var outputContent = ""
                // 生成モデルの呼び出し
                generativeModel.generateContentStream(inputContent)
                    .collect { response ->
                        outputContent += response.text
                        _generatedAnswer.value = outputContent
                    }
            } catch (e: Exception) {
                // 必要に応じてエラー処理を記述
            }
        }
    }

}

画像に関してGeminiに質問

サンプルカメラアプリを使ってGeminiモデルに色々な質問をしてみることにします。

質問①

コムウェアのロゴ（コムウェアからの郵便物を撮影）についてGeminiに聞いてみました。
（非常に惜しい！! ロゴだけでNTTってわかるのはすごいですね）

質問②

美味しいお寿司を買ったので、ついでにGeminiへ質問しました。
お寿司の種類も判定してくれました。
（×かっぱ巻き→⚪︎ネギトロ巻き、×かに→⚪︎たい、×しらす→⚪︎あじ）
とびっこが判定できたのはすごいですね。

※写真は自分で撮影したものです。

ソースコード一覧

以下サンプルカメラアプリ実装の際に使用したソースコードです

VertexAISampleApplication.kt

import android.app.Application
import com.yourcompany.vertexaisample.di.appModule
import org.koin.android.ext.koin.androidContext
import org.koin.core.context.GlobalContext.startKoin

class VertexAISampleApplication : Application() {
    override fun onCreate() {
        super.onCreate()
        startKoin {
            androidContext(this@VertexAISampleApplication)
            modules(
                appModule
            )
        }
    }
}

AppModule.kt

import com.google.firebase.Firebase
import com.google.firebase.vertexai.type.generationConfig
import com.google.firebase.vertexai.vertexAI
import com.yourcompany.vertexaisample.ui.MainViewModel
import org.koin.androidx.viewmodel.dsl.viewModel
import org.koin.dsl.module

val appModule = module {
    factory {
        // VertexAIのインスタンスを生成
        Firebase.vertexAI.generativeModel(
            modelName = "gemini-1.5-flash",
            generationConfig = generationConfig {
                // トークン選択のランダム性をコントロールするパラメータ。
                // 値が0であれば、最も確率の高いトークンが常に選択される。
                temperature = 0.7f
            }
        )
    }
    viewModel {
        // MainViewModelにVertexAIのインスタンスを注入
        MainViewModel(get())
    }
}

MainActivity.kt

import android.os.Bundle
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.activity.enableEdgeToEdge
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Surface
import androidx.compose.ui.Modifier
import com.yourcompany.vertexaisample.ui.theme.VertexAISampleTheme

class MainActivity : ComponentActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        // ステータスバーとナビゲーションバーを透過
        enableEdgeToEdge()
        setContent {
            VertexAISampleTheme {
                Surface(
                    modifier = Modifier.fillMaxSize(),
                    color = MaterialTheme.colorScheme.background
                ) {
                    MainScreen()
                }
            }
        }
    }
}

MainScreen.kt

import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.runtime.Composable
import androidx.compose.runtime.collectAsState
import androidx.compose.ui.Modifier
import org.koin.androidx.compose.koinViewModel

@Composable
fun MainScreen(
    viewModel: MainViewModel = koinViewModel(),
) {
    val isImageReasoningReady = viewModel.isImageReasoningReady.collectAsState().value
    Column(modifier = Modifier.fillMaxSize()) {
        if (isImageReasoningReady) {
            // 撮影した画像を画面に表示
            CapturedImageContent(
                photoBitmap = viewModel.photoBitmap.collectAsState().value ?: return@Column,
                generatedAnswer = viewModel.generatedAnswer.collectAsState().value,
                onReasoningButtonTapped = { viewModel.reasonPhoto(it) },
                onBackButtonTapped = { viewModel.deletePhoto() },
            )
        } else {
            // カメラプレビュー画面を表示
            CameraPreviewContent(
                onImageCaptured = { viewModel.storePhoto(it) },
            )
        }
    }
}

MainViewModel.kt

import android.graphics.Bitmap
import androidx.lifecycle.ViewModel
import androidx.lifecycle.viewModelScope
import com.google.firebase.vertexai.GenerativeModel
import com.google.firebase.vertexai.type.content
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.flow.MutableStateFlow
import kotlinx.coroutines.flow.StateFlow
import kotlinx.coroutines.launch

class MainViewModel(
    private val generativeModel: GenerativeModel,
) : ViewModel() {

    private var _isImageReasoningReady = MutableStateFlow(false)
    var isImageReasoningReady: StateFlow<Boolean> = _isImageReasoningReady

    private val _photoBitmap: MutableStateFlow<Bitmap?> = MutableStateFlow(null)
    val photoBitmap: StateFlow<Bitmap?> = _photoBitmap

    private val _generatedAnswer = MutableStateFlow<String>("")
    val generatedAnswer: StateFlow<String> = _generatedAnswer

    // 撮影した画像をBitmap形式で保存
    fun storePhoto(bitmap: Bitmap) {
        _photoBitmap.value = bitmap
        _isImageReasoningReady.value = true
    }

    // 撮影した画像を削除
    fun deletePhoto() {
        _photoBitmap.value = null
        _isImageReasoningReady.value = false
        _generatedAnswer.value = ""
    }

    // 画像とテキストを使用してVertexAIを呼び出す
    fun reasonPhoto(inputText: String) {
        viewModelScope.launch(Dispatchers.IO) {
            if (photoBitmap.value == null) {
                return@launch
            }
            try {
                val inputContent = content {
                    image(photoBitmap.value ?: return@content)
                    text(inputText)
                }
                // 結果を格納
                var outputContent = ""
                // 生成モデルの呼び出し
                generativeModel.generateContentStream(inputContent)
                    .collect { response ->
                        outputContent += response.text
                        _generatedAnswer.value = outputContent
                    }
            } catch (e: Exception) {
                // 必要に応じてエラー処理を記述
            }
        }
    }

}

CameraPreviewContent.kt

@Composable
fun CameraPreviewContent(
    onImageCaptured: (Bitmap) -> Unit,
) {
    val localContext = LocalContext.current
    val lifecycleCameraController = remember {
        LifecycleCameraController(localContext)
    }
    val lifecycleOwner = LocalLifecycleOwner.current

    // カメラのライフサイクル管理（本画面が表示される間だけカメラを起動）
    DisposableEffect(lifecycleOwner) {
        lifecycleCameraController.bindToLifecycle(lifecycleOwner)
        onDispose {
            lifecycleCameraController.unbind()
        }
    }

    Box(modifier = Modifier.fillMaxSize()) {
        // パーミッションが許可されている場合のみカメラビューを表示
        appPermissionUtil {
            Box(
                contentAlignment = Alignment.BottomCenter,
                modifier = Modifier.fillMaxSize()
            ) {
                // カメラプレビューを表示
                CameraPreview(
                    lifecycleCameraController = lifecycleCameraController,
                    modifier = Modifier.fillMaxSize()
                )
                // 撮影ボタンを表示
                CaptureButton(
                    lifecycleCameraController = lifecycleCameraController,
                    onPhotoCaptured = { bitmap ->
                        onImageCaptured(bitmap)
                    }
                )
            }
        }
    }
}

// カメラプレビューを表示するコンポーネント
@Composable
private fun CameraPreview(
    lifecycleCameraController: LifecycleCameraController,
    modifier: Modifier = Modifier,
) {
    // Android ViewをJetpack Composeに埋め込む
    AndroidView(
        modifier = modifier,
        factory = { context ->
            PreviewView(context).apply {
                implementationMode = PreviewView.ImplementationMode.COMPATIBLE
            }
        },
        update = { previewView ->
            previewView.controller = lifecycleCameraController
        }
    )
}

// 撮影ボタンを表示するコンポーネント
@Composable
private fun CaptureButton(
    lifecycleCameraController: LifecycleCameraController,
    onPhotoCaptured: (Bitmap) -> Unit,
) {
    val context = LocalContext.current
    Button(
        onClick = {
            capturePhoto(
                context = context,
                lifecycleCameraController = lifecycleCameraController,
                onPhotoCaptured = onPhotoCaptured,
            )
        },
        modifier = Modifier
            .padding(32.dp)
            .size(80.dp),
        shape = CircleShape,
    ) {
        Icon(
            painter = painterResource(id = R.drawable.photo_camera),
            contentDescription = "撮影",
        )
    }
}

// 撮影ボタンが押されたときの処理
private fun capturePhoto(
    lifecycleCameraController: LifecycleCameraController,
    context: Context,
    onPhotoCaptured: (Bitmap) -> Unit,
) {
    lifecycleCameraController.takePicture(
        ContextCompat.getMainExecutor(context),
        object : ImageCapture.OnImageCapturedCallback() {
            @ExperimentalGetImage
            override fun onCaptureSuccess(imageProxy: ImageProxy) {
                runCatching {
                    // 撮影成功時の処理
                    imageProxy.image?.apply {
                        // ImageをBitmapに変換して回転と反転を適用
                        val bitmap =
                            toBitmap().rotateAndMirror(imageProxy.imageInfo.rotationDegrees)
                        onPhotoCaptured(bitmap) // キャプチャ結果をコールバックで返す
                    }
                }.onFailure { e ->
                    // 必要に応じてエラー処理を記述
                }.also {
                    imageProxy.close() // リソースを解放
                }
            }

            override fun onError(exception: ImageCaptureException) {
                // 必要に応じてエラー処理を記述
            }
        }
    )
}

CaptureImageContent.kt

import android.graphics.Bitmap
import androidx.compose.foundation.Image
import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Box
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.Row
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.fillMaxWidth
import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.layout.size
import androidx.compose.foundation.shape.CircleShape
import androidx.compose.foundation.shape.RoundedCornerShape
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.automirrored.filled.ArrowForward
import androidx.compose.material.icons.filled.Close
import androidx.compose.material3.Button
import androidx.compose.material3.Card
import androidx.compose.material3.CardDefaults
import androidx.compose.material3.ExperimentalMaterial3Api
import androidx.compose.material3.Icon
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Text
import androidx.compose.material3.TextField
import androidx.compose.runtime.Composable
import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.remember
import androidx.compose.runtime.setValue
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.draw.clip
import androidx.compose.ui.graphics.asImageBitmap
import androidx.compose.ui.layout.ContentScale
import androidx.compose.ui.platform.LocalSoftwareKeyboardController
import androidx.compose.ui.unit.dp

// 画像を背後に設定し、VertexAI呼び出しと応答に対応するCard型コンポーネントを表示する
@Composable
fun CapturedImageContent(
    photoBitmap: Bitmap,
    generatedAnswer: String,
    onReasoningButtonTapped: (String) -> Unit,
    onBackButtonTapped: () -> Unit,
) {
    // Boxレイアウトで画面全体を覆う
    Box(Modifier.fillMaxSize()) {
        // 撮影した画像を表示
        Image(
            bitmap = photoBitmap.asImageBitmap(),
            contentDescription = "画像",
            contentScale = ContentScale.FillHeight,
            modifier = Modifier.fillMaxSize(),
        )
        Column(
            Modifier
                .fillMaxSize()
                .padding(16.dp),
            verticalArrangement = Arrangement.Bottom,
            horizontalAlignment = Alignment.Start,
        ) {
            // 戻るボタンを表示
            Button(
                onClick = onBackButtonTapped,
                modifier = Modifier
                    .clip(CircleShape)
                    .padding(bottom = 8.dp)
            ) {
                Icon(
                    Icons.Default.Close,
                    contentDescription = "戻る"
                )
            }
            // 質問入力エリアと結果表示カード
            TextInputCard(
                generatedAnswer = generatedAnswer,
                onImageReasoning = onReasoningButtonTapped
            )
        }
    }
}

@OptIn(ExperimentalMaterial3Api::class)
@Composable
private fun TextInputCard(
    generatedAnswer: String,
    onImageReasoning: (String) -> Unit,
) {
    var inputText by remember { mutableStateOf("") }
    val keyboardController = LocalSoftwareKeyboardController.current
    Card(
        shape = RoundedCornerShape(16.dp),
        elevation = CardDefaults.cardElevation(4.dp),
        modifier = Modifier
            .padding(bottom = 48.dp)
    ) {
        Column(Modifier.padding(16.dp)) {
            Row(
                Modifier.fillMaxWidth(),
                horizontalArrangement = Arrangement.spacedBy(8.dp),
                verticalAlignment = Alignment.CenterVertically
            ) {
                // ユーザーの入力を受け取るテキストフィールド
                TextField(
                    value = inputText,
                    onValueChange = { inputText = it },
                    placeholder = { Text("質問を入力してください") },
                    modifier = Modifier.weight(1f)
                )
                // VertexAI呼び出しボタン（キーボードを閉じ、画像と入力した質問を送信する）
                Button(
                    onClick = {
                        keyboardController?.hide()
                        onImageReasoning(inputText)
                    }
                ) {
                    Icon(
                        Icons.AutoMirrored.Default.ArrowForward,
                        contentDescription = "送信",
                        modifier = Modifier.size(18.dp)
                    )
                }
            }
            // VertexAIからの応答を表示
            if (generatedAnswer.isNotEmpty()) {
                Text(
                    text = generatedAnswer,
                    style = MaterialTheme.typography.bodyLarge,
                    modifier = Modifier.padding(8.dp)
                )
            }
        }
    }
}

まとめ

本記事ではVertex AI in Firebase SDKの中身を追い、内部構造について考えてきました。

内部構造への理解を元にサンプルカメラアプリを実装し、一体どのようにモバイルアプリからGeminiモデルへコンテンツを送信し、コンテンツを受け取るのかを見てきました。

Vertex AI in Firebaseを導入することによる気づきを以下にまとめてみました。

わかったこと

高速なAIアプリ（モバイルやWeb）開発が可能となった
導入に手間なく、数ステップで実装可能
商用版ではFirebase App Checkを行うため、セキュリティ面も考慮されている

懸念点

チューニング等が必要な場合はサーバーサイドで作り込む必要がありそう
ライブラリが更新された後、今まで作り込んでいたロジックへの修正が必要になる・実現できなくなることがあるのではないか
商用アプリとして作る場合は、ユーザーによるGemini APIの連続コール等の可能性を考慮する必要がある

結論、AIとモバイル開発はまさに熱い技術であり、今後も動向を追っていきます。

記載されている会社名、製品名、サービス名は、各社の商標または登録商標です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up