LoginSignup
3
2

More than 3 years have passed since last update.

Azure ML Studioを用いてクラッピーの拍手認識

Last updated at Posted at 2018-12-08

#クラッピーチャレンジ Advent Calendar 2018 の8日目の記事です。
今日は、HoloLensで何かしようと思ったんですが、クラッピーの拍手認識をしていきます。

データ収集

クラッピーの拍手を録音します。1秒間くらいです。
.wavで保存します。サンプルの拍手音はこちら。

MFCC (メル周波数ケプストラム係数) 抽出

生活音を機械学習してみたを参考に、拍手音からMFCCを抽出します。
サンプルの拍手音のMFCC、特徴量が12個出ます。(data/8.wavの例)

[-1.9186237270314184, -2.004025759735257, 0.395560047954659, 0.2792735281985869, 0.12812810526374957, 0.1483784002786181, 0.03947688683847212, 0.08311861377346112, 0.06240829479469415, 0.11180095656707598, 0.06045599294932884, 0.13795435863643107]

SVMで機械学習

Azure Machine Learning StudioのSVMを使って拍手を認識させたいと思います。

学習データセット作成

拍手音のデータのラベルを 1 とし、フリー素材などから拾ってきた1秒間くらいの拍手以外の音のラベルを 0 とします。
抽出したMFCCの12個の特徴量を並べて、そのあとにラベルの列を追加し、csvで保存します。
1行目は各列の項目を追加します。

学習データセットのサンプル(clap-recognition-data.csv)

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, label
-2.030952328509739, -1.7267258533574397, 0.323521935154163, -0.18399399989733234, -0.29852329518836207, -0.11008113177246494, 0.16693250965371575, 0.20997759177910322, 0.15722626036757342, 0.032582349099076716, -0.06053580971735837, -0.049217017068982405, 1
-2.0149171883630586, -2.1496651198985015, 0.18323683382114334, 0.019002802897492167, -0.10851490937755288, 0.025210824399056257, 0.09841304195097761, 0.12512365316963564, 0.030317325113665517, 0.1034488941187325, 0.022097402867728665, 0.03558504981106469, 1
-2.3979048738166573, -2.469565543640186, 0.10867925727791529, 0.19764366782121237, 0.05639542093578112, 0.05254087796911251, -0.029678601977552878, 0.021719389402260494, -0.041071836691565874, 0.08872659794957043, 0.10306514535945989, 0.16000781160703828, 1
-2.06614811869336, -2.170889715170483, 0.25754204004222786, 0.18033367680135767, 0.0325157605403691, 0.11174626860015509, 0.035075226042641794, 0.12993352104089304, 0.06058384426155165, 0.09234408866655579, 0.04472001655116191, 0.06767305551152383, 1
-2.083655535275477, -2.107960466042088, 0.27556398245945585, 0.20262480674689992, 0.06309170645570131, 0.06848649772844628, 0.004116281148212211, -0.05200396268580583, -0.14455986851534205, -0.11004309402392837, -0.09880047026253577, 0.10189426811296266, 1
-2.1281150813801526, -2.124516527255201, 0.306660395873793, 0.19897548495404233, -0.041875992809301356, 0.013291743124046705, 0.07919579529389115, 0.11304855737196774, 0.0002629483336810705, 0.09455601201823637, 0.03171696647317556, 0.07722662278567945, 1
-2.406434730780889, -2.5284005417258024, 0.08799904550499853, 0.1617209839271315, 0.019965420120432956, 0.09181090787296992, 0.01939902825796861, 0.05734257365770379, -0.008280138128152759, 0.11552624456046193, 0.05202802636712259, 0.07905963323786927, 1
-1.9186237270314184, -2.004025759735257, 0.395560047954659, 0.2792735281985869, 0.12812810526374957, 0.1483784002786181, 0.03947688683847212, 0.08311861377346112, 0.06240829479469415, 0.11180095656707598, 0.06045599294932884, 0.13795435863643107, 1
1.3107674391651687, -0.33425523518556643, -0.011539619135034122, -0.49155997940984336, 0.6785755367344419, -0.45492248755535186, 0.4288188313852052, -0.25867679911752, 0.011638388967118414, -0.23499444367757918, -0.20002869261061737, -0.06835265929978623, 0
1.2549603679299368, -0.4938186032036041, -0.35525892908633017, -0.7684226885817236, 0.39535817430611575, -1.1309949137169883, 0.09828609030978686, -0.8548464932606991, 0.04652898917215804, 0.04591126695782009, -0.2128582471038926, 0.11611730886878473, 0
1.0468089553287783, -0.7068555989190808, -0.20695050323633393, -0.6480336029768481, 0.3116239070104437, -0.9031773328373728, -0.01915464937008854, -0.7207511769246449, -0.18726961690671018, -0.05253397301820851, 0.01584268444069208, 0.11063633731385591, 0
0.7847029152814105, -0.5871881515065157, -0.17382771136257386, -0.5901297878889492, 0.5685831336059587, -0.6176993439410083, 0.09340014943206734, -0.7422343127254292, -0.2930028478845993, -0.2999271214295993, -0.14129008529956988, 0.08749218608185122, 0
-1.4531275770178873, -1.4651059497807617, 0.7207770720258173, -0.090476609106723, 0.3781967210158817, -0.18911396250236936, 0.0943408064088089, -0.34251278056714735, -0.09072359405224628, -0.4020710583479677, -0.16320865422285868, -0.36232589548076033, 0
-2.2038069811031473, -2.859915427017685, 0.42020458187647164, 0.2716119870395291, 0.11626480123425756, -2.531105436634157, -0.10148080634572769, 0.17355172402583036, 0.21225503799960277, -0.3223425684188577, 0.0458669327143578, 0.7798290356433241, 0
-4.2490500899904395, -0.6954062241416133, 1.1756969512154951, -0.22061125875414936, 0.45507469136143336, -0.1979339872924212, 0.5045295980839696, -0.07572936608792266, 0.4337868281028708, -0.09479590460286091, 0.2983703135730711, -0.18926925860230856, 0
0.2844324432237995, -2.8119576798191406, -0.6951073778390057, -0.8847675475570749, 1.0282811897407282, -0.40390264463758924, 0.7647131603070005, -0.2547867903063473, 0.13123249309417193, -0.05038804118448278, -0.23489909201442308, 0.09890527112115263, 0

トレーニング

Azure Machine Learning Studioを開きます。
次のようにして、学習モデルを作成(RUN)します。
clap04.PNG

まずまずな結果になりました。
clap05.PNG

Webサービス化

SET UP WEB SERVICE からWebサービス化します。
clap03.PNG

APIのキーを控えておきます。
clap01.PNG

API HELP PAGEのREQUEST/RESPONSEからPythonのサンプルコードをコピペして使います。
clap02.PNG

テスト

実際にClappyの拍手の音を録音して、MFCC抽出したものを入力して確かめてみましょう!(data/9.wav)

import urllib2
# If you are using Python 3+, import urllib instead of urllib2
import json

data =  {
    "Inputs": {
        "input1": {
            "ColumnNames": ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "label"],
            # 0
            # "Values": [ ["2.367817766435614", "-1.7707634273097193", "-0.4382961790154758", "-2.001739866336805", "-0.49371792201065967", "-1.0420541157859537", "1.032755991318499", "0.47477291184196796", "0.9391550569634358", "0.2666925456582178", "0.008618102959970703", "-0.48544872498404157", "0"], ]
            # 1
            "Values": [ ["-2.260111202570416", "-2.3438376498712006", "0.24659812867111622", "0.25804594443598305", "0.07715554746674536", "0.17240330514212862", "-0.0020692953469511116", "0.050617681598404796", "-0.0269192424244684", "0.022134494442011866", "0.00017100456394491428", "0.12138839606409804", "1"], ]
        },
    },
    "GlobalParameters": {
    }
}

body = str.encode(json.dumps(data))

url = '<insert your workspace url>'
api_key = '<insert your api key>' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib2.Request(url, body, headers)

try:
    response = urllib2.urlopen(req)

    # If you are using Python 3+, replace urllib2 with urllib.request in the above code:
    # req = urllib.request.Request(url, body, headers)
    # response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib2.HTTPError, error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())

    print(json.loads(error.read()))

結果は80%拍手と認識されたようです。

{"Results":{"output1":{"type":"table","value":{"ColumnNames":["1","2","3","4","5","6","7","8","9","10","11","12","label","Scored Labels","Scored Probabilities"],"ColumnTypes":["Double","Double","Double","Double","Double","Double","Double","Double","Double","Double","Double","Double","Int32","Int32","Double"],"Values":[["-2.26011120257042","-2.3438376498712","0.246598128671116","0.258045944435983","0.0771555474667454","0.172403305142129","-0.00206929534695111","0.0506176815984048","-0.0269192424244684","0.0221344944420119","0.000171004563944914","0.121388396064098","1","1","0.806501805782318"]]}}}}

まとめ

  • クラッピーの拍手(1秒間)を録音し、MFCCを抽出し、学習データを作成しました。
  • Azure Machine Learning Studioを用いて、SVMで機械学習を行い、拍手を認識をしました。
  • WEBサービス化しました。
  • リアルタイムでやりたいなあ。
3
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
2