HoloLensで始めるCognitive Services（Computer Vision APIサンプル編）

Last updated at 2019-10-03Posted at 2017-06-16

この記事はHoloLensで始めるCognitive Services（Emotion APIサンプル編）の続きです。
まだEmotion APIのサンプルを試していない場合は、そちらを先に実施することを推奨いたします。

はじめに

HoloLensの登場により、外部環境のすべてがシステムのアウトプット領域になりました。
これは同時に外部環境のすべてがインプット領域になったということでもあります。

外部環境を認識・解析し、アプリケーション側のINPUTとすれば、よりユーザーに寄り添ったアプリケーションの作成が可能となるでしょう。

この外部環境解析を行うために、Microsoft AzureのCognitive ServicesをHoloLensで使ってみたいと思います。
今回はComputer Vision APIを使用してみます。

基本的にはHoloLensで始めるCognitive Services（Emotion APIサンプル編）と同様の進め方となりますので
記載が重複する部分は「Emotion API サンプル編を参照」として割愛していきます。

環境

OS：Windows 10 CreatersUpdate
Unity：5.6.0f3
HoloToolkit：1.5.5.0

完成イメージ

https://t.co/eeZIjoWxT1
のunity-vision-demoをHoloLens用に手直しして実行した様子。 pic.twitter.com/zRqUolCJwQ
— morio@6/25～米国 (@morio36) 2017年6月16日

０．事前準備

本サンプルを試すためには、Azureのアカウントが必要です。
また、Computer Vision APIのサブスクリプションも必要となります。
方法はEmotion APIサンプル編を参照してください。

１．サンプルを取得～解凍～配置

下記のgithubからサンプルをダウンロードしてください。
Emotion APIを試した場合は、その際のサンプルをご使用ください。
https://github.com/misslivirose/unity-cognitive-services-demos

そして\unity-cognitive-services-demos-master
の中の
unity-vision-demo
を切り取ってローカルドライブの上層のほうに移動してください。
そうしないとパスが長すぎてVisualStudioでのビルド時に高確率でエラーが出ます。

サンプルのスクリプトはおおよそ下記のような責務で実装されています。

FoundImageObject.cs Computer Vision APIが返してくるカテゴリ情報を取り扱うための構造体です。カテゴリ名や予測精度を保持します。
ImageToComputerVisionAPI.cs AzureのAPIをコールするメインロジックです。今回はここにHoloLens用のロジックを追記していきます。
ParseComputerVisionResponse.cs FoundImageObjectのリストを保持する構造体です。
ShowImageOnPanel.cs Unity上のパネルに対してStreamingAssetsから画像を貼り付けるスクリプトです。

２．実装

※当記事の最後にgithubのサンプルコードもつけていますので、そちらもご覧ください。

プロジェクトを開くとコンパイルエラーが出てますが、Build SettingsからWindows StoreにSwitch Pratformすると直ります。

APIキーの設定
まず、Computer Vision APIのキーをVISIONKEYに設定する必要があります。
０．事前準備で作成したComputer Vision APIのアクセスキーをImageToComputerVisionAPI.csのVISIONKEYにコピペしてください。

Input/Output実装
このサンプルもEmotion APIと同様にHoloLens用のInput/Outputが実装されていませんので、自力で実装します。Unity単体としての動きはこの時点で確認可能です。

HoloToolkit入れ替え
Emotion APIの時と同様に、HoloToolkitを1.5.5.0に入れ替えてください。
方法はEmotion APIサンプル編を参照してください。

Input追加
InputManagerプレファブとBasicCursolプレファブをヒエラルキーに追加します。
方法はEmotion APIサンプル編を参照してください。

Output再配置
ComputerVisionManagersのフィールド「Image Frame Object」の参照がはがれている場合、再度ヒエラルキーからアタッチします。
方法はEmotion APIサンプル編を参照してください。

エアタップ実装
エアタップを検知し、画像を読み込んでAPIをコールする部分を実装します。
ImageToComputerVisionAPI.csの継承元にIInputClickHandlerを追加し、using句を追加、さらにOnInputClickedを実装してください。panelやtextも操作できるようフィールドに追加します。
実害はないのですが、emotionURLといった記述が残っているので、ついでにvisionURLに変更しておきます。

ImageToComputerVisionAPI.cs


using UnityEngine;
using System.Collections;
using System.IO;
using UnityEngine.Networking;
using System;
using System.Collections.Generic;
using HoloToolkit.Unity.InputModule;
using UnityEngine.UI;

public class ImageToComputerVisionAPI : MonoBehaviour, IInputClickHandler
{

    string VISIONKEY = "VISIONKEY"; // replace with your Computer Vision API Key

    string visionURL = "https://westeurope.api.cognitive.microsoft.com/vision/v1.0/analyze";

    public string fileName { get; private set; }
    string responseData;

    private ShowImageOnPanel panel; //サンプルに対して追加
    public Text text;               //サンプルに対して追加

    // Use this for initialization
    void Start () {
      fileName = Path.Combine(Application.streamingAssetsPath, "cityphoto.jpg"); // Replace with your file

        InputManager.Instance.PushFallbackInputHandler(gameObject); //サンプルに対して追加
        panel = gameObject.GetComponent<ShowImageOnPanel>();        //サンプルに対して追加
    }

  // Update is called once per frame
  void Update () {

        // This will be called with your specific input mechanism
        if(Input.GetKeyDown(KeyCode.Space))
        {
            StartCoroutine(GetVisionDataFromImages());
        }

  }
    /// <summary>
    /// Get Vision data from the Cognitive Services Computer Vision API
    /// Stores the response into the responseData string
    /// </summary>
    /// <returns> IEnumerator - needs to be called in a Coroutine </returns>
    IEnumerator GetVisionDataFromImages()
    {
        byte[] bytes = UnityEngine.Windows.File.ReadAllBytes(fileName);

        var headers = new Dictionary<string, string>() {
            { "Ocp-Apim-Subscription-Key", VISIONKEY },
            { "Content-Type", "application/octet-stream" }
        };

        WWW www = new WWW(visionURL, bytes, headers);

        yield return www;
        responseData = www.text; // Save the response as JSON string
        GetComponent<ParseComputerVisionResponse>().ParseJSONData(responseData);

        text.text = responseData;//mori
    }

    public void OnInputClicked(InputEventData eventData) //サンプルに対して追加
    {
        panel.DisplayImage();

        StartCoroutine(GetVisionDataFromImages());

    }
}

画像表示部分の実装
ShowImageOnPanel.csのDisplayImage()のアクセスレベルをpublicに修正します。
※やはり無理やり感が拭えません。。

ShowImageOnPanel.cs

    public void DisplayImage() //サンプルに対して変更
    {
        Texture2D imageTxtr = new Texture2D(2, 2);
        string fileName = gameObject.GetComponent<ImageToComputerVisionAPI>().fileName;
        byte[] fileData = File.ReadAllBytes(fileName);
        imageTxtr.LoadImage(fileData);
        ImageFrameObject.GetComponent<Renderer>().material.mainTexture = imageTxtr;
    }

テキスト出力実装
今のままだとコンソールにしか結果が出力されませんので、HoloLens上でテキスト出力できるよう修正します。
方法はEmotion APIサンプル編を参照してください。

Bulid Settings
HoloLens用のビルド設定を行います。Scene、Project、Capabilityを設定してください。
InterNetClientのCapabilityは必須です。
サンプルの変更漏れだと思いますが、Player Settingsの
Product Name、Package name、Descriptionがface-emotion-hololens-demoになっているので、
computer-vision-hololens-demoに変更しておきましょう。
でないとせっかくEmotion APIサンプル編で作ったEmotionAPIサンプルが上書きされてしまいます。

コンパイルエラー修正
ビルドしてみると、JSONTemplates.csでコンパイルエラーが出るので明示的にFieldInfo[]にキャストします。
方法はEmotion APIサンプル編を参照してください。

以上でUnity側の設定は完了です。Visual Studioでビルド＆実機へデプロイし、動作を確認してください。
画像を別の物に差し替えたり、戻り値のJSONを変換してオブジェクト配置したり、いろいろな派生形が考えられます！
トライしてみてください！

派生形のサンプル

エアタップで写真撮影→AzureのComputerVisionAPIを呼び出してtagとCaption取得→tagの中から選んだ単語でWikipedia検索やってみた。勉強になったけど実用性は不明。#HoloLens #HoloLensJP pic.twitter.com/tj5w6fZABY
— morio@6/25～米国 (@morio36) 2017年5月7日

ソースコード

下記にアップしていますので、必要に応じてご参照ください。
https://github.com/haveagit/HoloLens-Vision-Demo

次回はBing Seech APIのサンプルを実装します！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up