#クラッピーチャレンジ Advent Calendar 2018 の8日目の記事です。
今日は、HoloLensで何かしようと思ったんですが、クラッピーの拍手認識をしていきます。
データ収集
クラッピーの拍手を録音します。1秒間くらいです。
.wavで保存します。サンプルの拍手音はこちら。
MFCC (メル周波数ケプストラム係数) 抽出
生活音を機械学習してみたを参考に、拍手音からMFCCを抽出します。
サンプルの拍手音のMFCC、特徴量が12個出ます。(data/8.wavの例)
[-1.9186237270314184, -2.004025759735257, 0.395560047954659, 0.2792735281985869, 0.12812810526374957, 0.1483784002786181, 0.03947688683847212, 0.08311861377346112, 0.06240829479469415, 0.11180095656707598, 0.06045599294932884, 0.13795435863643107]
SVMで機械学習
Azure Machine Learning StudioのSVMを使って拍手を認識させたいと思います。
学習データセット作成
拍手音のデータのラベルを 1
とし、フリー素材などから拾ってきた1秒間くらいの拍手以外の音のラベルを 0
とします。
抽出したMFCCの12個の特徴量を並べて、そのあとにラベルの列を追加し、csvで保存します。
1行目は各列の項目を追加します。
学習データセットのサンプル(clap-recognition-data.csv)
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, label
-2.030952328509739, -1.7267258533574397, 0.323521935154163, -0.18399399989733234, -0.29852329518836207, -0.11008113177246494, 0.16693250965371575, 0.20997759177910322, 0.15722626036757342, 0.032582349099076716, -0.06053580971735837, -0.049217017068982405, 1
-2.0149171883630586, -2.1496651198985015, 0.18323683382114334, 0.019002802897492167, -0.10851490937755288, 0.025210824399056257, 0.09841304195097761, 0.12512365316963564, 0.030317325113665517, 0.1034488941187325, 0.022097402867728665, 0.03558504981106469, 1
-2.3979048738166573, -2.469565543640186, 0.10867925727791529, 0.19764366782121237, 0.05639542093578112, 0.05254087796911251, -0.029678601977552878, 0.021719389402260494, -0.041071836691565874, 0.08872659794957043, 0.10306514535945989, 0.16000781160703828, 1
-2.06614811869336, -2.170889715170483, 0.25754204004222786, 0.18033367680135767, 0.0325157605403691, 0.11174626860015509, 0.035075226042641794, 0.12993352104089304, 0.06058384426155165, 0.09234408866655579, 0.04472001655116191, 0.06767305551152383, 1
-2.083655535275477, -2.107960466042088, 0.27556398245945585, 0.20262480674689992, 0.06309170645570131, 0.06848649772844628, 0.004116281148212211, -0.05200396268580583, -0.14455986851534205, -0.11004309402392837, -0.09880047026253577, 0.10189426811296266, 1
-2.1281150813801526, -2.124516527255201, 0.306660395873793, 0.19897548495404233, -0.041875992809301356, 0.013291743124046705, 0.07919579529389115, 0.11304855737196774, 0.0002629483336810705, 0.09455601201823637, 0.03171696647317556, 0.07722662278567945, 1
-2.406434730780889, -2.5284005417258024, 0.08799904550499853, 0.1617209839271315, 0.019965420120432956, 0.09181090787296992, 0.01939902825796861, 0.05734257365770379, -0.008280138128152759, 0.11552624456046193, 0.05202802636712259, 0.07905963323786927, 1
-1.9186237270314184, -2.004025759735257, 0.395560047954659, 0.2792735281985869, 0.12812810526374957, 0.1483784002786181, 0.03947688683847212, 0.08311861377346112, 0.06240829479469415, 0.11180095656707598, 0.06045599294932884, 0.13795435863643107, 1
1.3107674391651687, -0.33425523518556643, -0.011539619135034122, -0.49155997940984336, 0.6785755367344419, -0.45492248755535186, 0.4288188313852052, -0.25867679911752, 0.011638388967118414, -0.23499444367757918, -0.20002869261061737, -0.06835265929978623, 0
1.2549603679299368, -0.4938186032036041, -0.35525892908633017, -0.7684226885817236, 0.39535817430611575, -1.1309949137169883, 0.09828609030978686, -0.8548464932606991, 0.04652898917215804, 0.04591126695782009, -0.2128582471038926, 0.11611730886878473, 0
1.0468089553287783, -0.7068555989190808, -0.20695050323633393, -0.6480336029768481, 0.3116239070104437, -0.9031773328373728, -0.01915464937008854, -0.7207511769246449, -0.18726961690671018, -0.05253397301820851, 0.01584268444069208, 0.11063633731385591, 0
0.7847029152814105, -0.5871881515065157, -0.17382771136257386, -0.5901297878889492, 0.5685831336059587, -0.6176993439410083, 0.09340014943206734, -0.7422343127254292, -0.2930028478845993, -0.2999271214295993, -0.14129008529956988, 0.08749218608185122, 0
-1.4531275770178873, -1.4651059497807617, 0.7207770720258173, -0.090476609106723, 0.3781967210158817, -0.18911396250236936, 0.0943408064088089, -0.34251278056714735, -0.09072359405224628, -0.4020710583479677, -0.16320865422285868, -0.36232589548076033, 0
-2.2038069811031473, -2.859915427017685, 0.42020458187647164, 0.2716119870395291, 0.11626480123425756, -2.531105436634157, -0.10148080634572769, 0.17355172402583036, 0.21225503799960277, -0.3223425684188577, 0.0458669327143578, 0.7798290356433241, 0
-4.2490500899904395, -0.6954062241416133, 1.1756969512154951, -0.22061125875414936, 0.45507469136143336, -0.1979339872924212, 0.5045295980839696, -0.07572936608792266, 0.4337868281028708, -0.09479590460286091, 0.2983703135730711, -0.18926925860230856, 0
0.2844324432237995, -2.8119576798191406, -0.6951073778390057, -0.8847675475570749, 1.0282811897407282, -0.40390264463758924, 0.7647131603070005, -0.2547867903063473, 0.13123249309417193, -0.05038804118448278, -0.23489909201442308, 0.09890527112115263, 0
トレーニング
Azure Machine Learning Studioを開きます。
次のようにして、学習モデルを作成(RUN)します。
Webサービス化
SET UP WEB SERVICE からWebサービス化します。
API HELP PAGEのREQUEST/RESPONSEからPythonのサンプルコードをコピペして使います。
テスト
実際にClappyの拍手の音を録音して、MFCC抽出したものを入力して確かめてみましょう!(data/9.wav)
import urllib2
# If you are using Python 3+, import urllib instead of urllib2
import json
data = {
"Inputs": {
"input1": {
"ColumnNames": ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "label"],
# 0
# "Values": [ ["2.367817766435614", "-1.7707634273097193", "-0.4382961790154758", "-2.001739866336805", "-0.49371792201065967", "-1.0420541157859537", "1.032755991318499", "0.47477291184196796", "0.9391550569634358", "0.2666925456582178", "0.008618102959970703", "-0.48544872498404157", "0"], ]
# 1
"Values": [ ["-2.260111202570416", "-2.3438376498712006", "0.24659812867111622", "0.25804594443598305", "0.07715554746674536", "0.17240330514212862", "-0.0020692953469511116", "0.050617681598404796", "-0.0269192424244684", "0.022134494442011866", "0.00017100456394491428", "0.12138839606409804", "1"], ]
},
},
"GlobalParameters": {
}
}
body = str.encode(json.dumps(data))
url = '<insert your workspace url>'
api_key = '<insert your api key>' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
req = urllib2.Request(url, body, headers)
try:
response = urllib2.urlopen(req)
# If you are using Python 3+, replace urllib2 with urllib.request in the above code:
# req = urllib.request.Request(url, body, headers)
# response = urllib.request.urlopen(req)
result = response.read()
print(result)
except urllib2.HTTPError, error:
print("The request failed with status code: " + str(error.code))
# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read()))
結果は80%拍手と認識されたようです。
{"Results":{"output1":{"type":"table","value":{"ColumnNames":["1","2","3","4","5","6","7","8","9","10","11","12","label","Scored Labels","Scored Probabilities"],"ColumnTypes":["Double","Double","Double","Double","Double","Double","Double","Double","Double","Double","Double","Double","Int32","Int32","Double"],"Values":[["-2.26011120257042","-2.3438376498712","0.246598128671116","0.258045944435983","0.0771555474667454","0.172403305142129","-0.00206929534695111","0.0506176815984048","-0.0269192424244684","0.0221344944420119","0.000171004563944914","0.121388396064098","1","1","0.806501805782318"]]}}}}
まとめ
- クラッピーの拍手(1秒間)を録音し、MFCCを抽出し、学習データを作成しました。
- Azure Machine Learning Studioを用いて、SVMで機械学習を行い、拍手を認識をしました。
- WEBサービス化しました。
- リアルタイムでやりたいなあ。