0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Dancing To Musicを動かしてみた

Last updated at Posted at 2020-12-14

概要

2020年に発表された論文Dancing To Musicのソースコードを試してみました
https://github.com/NVlabs/Dancing2Music

モジュールのバージョンが一切公開されていなかったので、動作する組み合わせを調べるのに大変時間がかかりました。バージョンを明記したrequirement.txtも同時に公開してほしいものです。今回はデモを動かすところまで動作確認できました。学習までは試していません。

環境の準備

requirement.txtはこんな感じで。私はローカルのubuntu18.04環境(GTX1080Ti)のdocker上で実行しています。以下のモジュールに加えて、Pythonは3.6.3でデモの動作確認できました。Python3.6.0では動かないので注意。

numpy==1.18.5
matplotlib
torch==1.7.0
torchvision==0.8.1
librosa==0.8.0
jupyter==1.0.0
opencv-python==4.4.0
tensorflow==2.3.1

ffmpegも必要です。

apt install ffmpeg

データとモデルのダウンロード

かなりわかりにくいですがreadme.mdのProjectのところにリンクがあります
http://vllab.ucmerced.edu/hylee/Dancing2Music/script.txt

## Dataset
### Content
#### 3 zip files containing data of three dancing categories: Zumba, ballet, and hiphop.
#### 1 zip files containing data statistics and data path lists for trainint usage.

URL=http://vllab.ucmerced.edu/hylee/Dancing2Music/ballet.zip
wget -N $URL -O ./ballet.zip
unzip ./ballet.zip -d .
rm ./ballet.zip

...(以下略)

こちらをシェルで実行するとダウンロードが開始されます

(途中略)
./data.zip                                                            100%[=========================================================================================================================================================================>]   1.33M   541KB/s    in 2.5s

2020-12-06 15:06:55 (541 KB/s) - './data.zip' saved [1394787/1394787]

Archive:  ./data.zip
  inflating: ./stats/all_aud_mean.npy
  inflating: ./stats/all_aud_std.npy
  inflating: ./stats/all_onbeat_mean.npy
  inflating: ./stats/all_onbeat_std.npy
  inflating: ./stats/aud_3cls.ckpt
  inflating: ./unitList/ballet_unitseq3.txt
  inflating: ./unitList/ballet_unitseq4.txt
  inflating: ./unitList/ballet_unit.txt
  inflating: ./unitList/hiphop_unitseq3.txt
  inflating: ./unitList/hiphop_unitseq4.txt
  inflating: ./unitList/hiphop_unit.txt
  inflating: ./unitList/zumba_unitseq3.txt
  inflating: ./unitList/zumba_unitseq4.txt
  inflating: ./unitList/zumba_unit.txt
--2020-12-06 15:06:55--  http://vllab.ucmerced.edu/hylee/Dancing2Music/Stage1.ckpt
Resolving vllab.ucmerced.edu (vllab.ucmerced.edu)... 169.236.184.69
Connecting to vllab.ucmerced.edu (vllab.ucmerced.edu)|169.236.184.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185511583 (177M) [text/plain]
Saving to: 'Stage1.ckpt'

(以下略)

ソースコードのダウンロードと実行

こちらからgit cloneしたソースではファイルが足りないので動きません!
https://github.com/NVlabs/Dancing2Music

著者が自分のページで公開しているものをダウンロードします(これはひどい)
http://vllab.ucmerced.edu/hylee/Dancing2Music/demo.zip

demoフォルダの中にcheckpointフォルダを作って、ダウンロードしたチェックポイントのファイルを入れます

%mkdir demo/checkpoint
%cp Stage1.ckpt demo/checkpoint
%cp Stage2.ckpt demo/checkpoint

demo.pyを実行します。aud_pathに入力音声ファイルを指定します。--out_fileに出力するダンスの動画ファイル名を指定します。2つ目のチェックポイントは--resumeで指定します(githubのドキュメントと違うので注意)

%Dancing2Music/demo# cat demo.sh
python demo.py --decomp_snapshot checkpoint/Stage1.ckpt --resume checkpoint/Stage2.ckpt --aud_path demo/ChillingMusic.wav --out_file demo/output.mp4 --out_dir demo/out_frame

うまくいけば以下のような表示の後でoutput.mp4が出力されます。

%sh demo.sh
2020-12-14 12:12:08.409682: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-12-14 12:12:08.409714: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'demo/ChillingMusic.wav':
  Duration: 00:00:27.41, bitrate: 1411 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'demo/ChillingMusic-formatted.wav':
  Metadata:
    ISFT            : Lavf57.83.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc57.107.100 pcm_s16le
size=    1180kB time=00:00:27.40 bitrate= 352.8kbits/s speed=1.26e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.006453%
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
Loading Done
process 0/5
process 1/5
process 2/5
process 3/5
process 4/5
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Input #0, image2, from 'demo_output/frame%03d.png':
  Duration: 00:00:19.20, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: png, rgb24(pc), 500x256, 25 fps, 25 tbr, 25 tbn, 25 tbc
Guessed Channel Layout for Input Stream #1.0 : stereo
Input #1, wav, from 'demo/ChillingMusic.wav':
  Duration: 00:00:27.41, bitrate: 1411 kb/s
    Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[image2 @ 0x561b882ddb20] Thread message queue blocking; consider raising the thread_queue_size option (current value: 8)
[libx264 @ 0x561b883bb060] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x561b883bb060] profile High, level 2.1
[libx264 @ 0x561b883bb060] 264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'demo/output.mp4':
  Metadata:
    encoder         : Lavf57.83.100
    Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 500x256, q=-1--1, 30 fps, 15360 tbn, 30 tbc
    Metadata:
      encoder         : Lavc57.107.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc57.107.100 aac
frame=  959 fps=671 q=-1.0 Lsize=     747kB time=00:00:31.86 bitrate= 192.0kbits/s dup=479 drop=0 speed=22.3x
video:277kB audio:438kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 4.482303%
[libx264 @ 0x561b883bb060] frame I:4     Avg QP: 6.57  size:  2596
[libx264 @ 0x561b883bb060] frame P:251   Avg QP:22.27  size:   736
[libx264 @ 0x561b883bb060] frame B:704   Avg QP:18.87  size:   125
[libx264 @ 0x561b883bb060] consecutive B-frames:  0.5%  4.6%  0.6% 94.3%
[libx264 @ 0x561b883bb060] mb I  I16..4: 90.4%  1.6%  8.1%
[libx264 @ 0x561b883bb060] mb P  I16..4:  0.4%  0.9%  0.2%  P16..4:  3.2%  3.0%  2.5%  0.0%  0.0%    skip:89.9%
[libx264 @ 0x561b883bb060] mb B  I16..4:  0.1%  0.0%  0.0%  B16..8:  5.5%  0.9%  0.3%  direct: 0.1%  skip:93.0%  L0:28.4% L1:69.4% BI: 2.2%
[libx264 @ 0x561b883bb060] 8x8 transform intra:28.0% inter:6.3%
[libx264 @ 0x561b883bb060] coded y,uvDC,uvAC intra: 4.4% 16.2% 13.6% inter: 0.7% 2.2% 2.0%
[libx264 @ 0x561b883bb060] i16 v,h,dc,p: 89%  5%  6%  0%
[libx264 @ 0x561b883bb060] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu:  4%  2% 94%  0%  0%  0%  0%  0%  0%
[libx264 @ 0x561b883bb060] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 11% 41%  3%  2%  4%  2%  5%  1%
[libx264 @ 0x561b883bb060] i8c dc,h,v,p: 71%  9% 19%  1%
[libx264 @ 0x561b883bb060] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x561b883bb060] ref P L0: 58.2%  3.0% 20.7% 18.1%
[libx264 @ 0x561b883bb060] ref B L0: 69.6% 20.0% 10.4%
[libx264 @ 0x561b883bb060] ref B L1: 97.4%  2.6%
[libx264 @ 0x561b883bb060] kb/s:70.94
[aac @ 0x561b883bea60] Qavg: 397.796

実行結果

こんな感じの動画が出力されます(Qiitaの都合でgifにしていますが実際にはmp4で音が出ます)
ezgif.com-gif-maker.gif

音と合っているかといわれると、合っているようにも見えますが、微妙なところです。やはりこの手の論文は実際に動かしてみるに限りますね

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?