More than 3 years have passed since last update.

iOSアプリでWebRTC

Last updated at 2021-11-27Posted at 2020-07-10

Page 1 of 36

自己紹介

iOSアプリエンジニア
iOSアプリ, Androidアプリ、web(java/ruby/php)を対応範囲広めにやっている感じ。

モチベーション

業務でWebRTCを使ったビデオ通話を実装することになった。
だが、Twilio Voideo SDK(サーバーも含めてのビデオ通話サービス)を使ったものだったため、WebRTCをちゃんとわからなくても、サンプルとGetStartedの通りにやれば実装できてしまった。
WebRTC自体、どういう感じで成り立っているのか知りたいと思い調べた。

WebRTCとは

Wikipediaより

ウェブブラウザやモバイルアプリケーションにシンプルなAPI経由でリアルタイム通信（英: real-time communication; RTC）を提供する自由かつオープンソースのプロジェクトである。ウェブページ内で直接のピア・ツー・ピア通信によって、プラグインのインストールやネイティブアプリのダウンロードをせずに、ウェブブラウザ間のボイスチャット、ビデオチャット、ファイル共有が可能になる。
多分、技術的には2017年くらいで流行った印象

P2P通信をどのように実現しているか

P2Pといっても直接通信しているわけではなく、サーバー経由で通信を実現している。
SFU（Selective Forwarding Unit） = サーバーが、クライアントに映像/音声などを代理で受信、配信。
MCU(Multipoint Control Unit) = サーバー側で１つの映像/音声などを合成して作成、配信。配信映像を合成するためCPU負荷高い。

現状のデバイスやブラウザサポート状況

iOS: safari 11(iOS11)以降ではサポート。WKWebViewでは未だ非サポート。SFSafariViewControllerはiOS13で動作するらしい。
Android: Android6,7辺り（android chromeではandroid7くらい）。Huawei系端末など、一部でH264コーデックの映像が非対応でうまく映像表示できないなどはあるかも。
PC ブラウザでは、Chrome, Firefox, Safari, Edgeなど、だいたい動く。

WebRTCのサービス

SkyWay, vonage, Twilioなど、webrtcをサーバー含めてビデオ通話サービスとして提供しているところがそれ用のSDKも出している。
が今回は、GoogleWebRTC(多分公式ライブラリ)の実装方法見ながら理解を深めたい。
- https://webrtc.github.io/webrtc-org/native-code/ios/

サンプル

今回は、この素晴らしいサンプルを利用して、実装を見ていく。
- https://github.com/stasel/WebRTC-iOS
WebRTC公式ではfirebaseを利用したwebのサンプルコードがある。
https://webrtc.org/getting-started/firebase-rtc-codelab

どういうふうに実装されているか

接続

シグナリング(ICE Candidate(通信経路の情報)とSDPの交換を中継)サーバーを介して通信の確立するためのやり取りを行う。
サンプルではnodejsのサーバーがある
サンプルでは、Googleが公開しているSTUN（NAT超えのためのプロトコル）サーバーが利用されている。

接続

AudioTrack/VideoTrack/Data Channelの３本の通信ラインを作成する。
クライアント同士で、Offer SDP と Answer SDPを交換する（シグナリングと呼ばれる）
SDPには対応映像/音声コーデックが何が使えるか、ストリームID、port番号、IPアドレスなどが入ってる。
SDPの交換はWebSocketで行われている。

SDP

v=0
o=- 7063950325015941208 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE 0 1 2
a=msid-semantic: WMS stream
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 102 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:Agzu
a=ice-pwd:mef2jhxanVddiHDLNvw7BMl9
a=ice-options:trickle renomination
a=fingerprint:sha-256 D7:E8:97:ED:96:EC:6D:8D:44:35:E8:51:A0:07:A0:EC:57:67:B9:76:16:31:1C:3E:6A:DF:A7:9D:E5:98:80:3A
a=setup:actpass
a=mid:0
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=sendrecv
a=msid:stream audio0
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:102 ILBC/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:2493853505 cname:UcicltjXPgMMZ/9S
a=ssrc:2493853505 msid:stream audio0
a=ssrc:2493853505 mslabel:stream
a=ssrc:2493853505 label:audio0
m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 127 124 125
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:Agzu
a=ice-pwd:mef2jhxanVddiHDLNvw7BMl9
a=ice-options:trickle renomination
a=fingerprint:sha-256 D7:E8:97:ED:96:EC:6D:8D:44:35:E8:51:A0:07:A0:EC:57:67:B9:76:16:31:1C:3E:6A:DF:A7:9D:E5:98:80:3A
a=setup:actpass
a=mid:1
a=extmap:14 urn:ietf:params:rtp-hdrext:toffset
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:13 urn:3gpp:video-orientation
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:12 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=extmap:11 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:8 http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07
a=extmap:9 http://www.webrtc.org/experiments/rtp-hdrext/color-space
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=sendrecv
a=msid:stream video0
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 H264/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=fmtp:96 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=640c34
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=rtpmap:98 H264/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=fmtp:98 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e034
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
a=rtpmap:127 red/90000
a=rtpmap:124 rtx/90000
a=fmtp:124 apt=127
a=rtpmap:125 ulpfec/90000
a=ssrc-group:FID 2153314113 822817334
a=ssrc:2153314113 cname:UcicltjXPgMMZ/9S
a=ssrc:2153314113 msid:stream video0
a=ssrc:2153314113 mslabel:stream
a=ssrc:2153314113 label:video0
a=ssrc:822817334 cname:UcicltjXPgMMZ/9S
a=ssrc:822817334 msid:stream video0
a=ssrc:822817334 mslabel:stream
a=ssrc:822817334 label:video0
m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
a=ice-ufrag:Agzu
a=ice-pwd:mef2jhxanVddiHDLNvw7BMl9
a=ice-options:trickle renomination
a=fingerprint:sha-256 D7:E8:97:ED:96:EC:6D:8D:44:35:E8:51:A0:07:A0:EC:57:67:B9:76:16:31:1C:3E:6A:DF:A7:9D:E5:98:80:3A
a=setup:actpass
a=mid:2
a=sctp-port:5000
a=max-message-size:262144

コード

        let config = RTCConfiguration()
        config.iceServers = [RTCIceServer(urlStrings: iceServers)]
        
        // Unified plan is more superior than planB
        config.sdpSemantics = .unifiedPlan
        
        // gatherContinually will let WebRTC to listen to any network changes and send any new candidates to the other client
        config.continualGatheringPolicy = .gatherContinually
        
        let constraints = RTCMediaConstraints(mandatoryConstraints: nil,
                                              optionalConstraints: ["DtlsSrtpKeyAgreement":kRTCMediaConstraintsValueTrue])
        self.peerConnection = WebRTCClient.factory.peerConnection(with: config, constraints: constraints, delegate: nil)
        
        super.init()
        self.createMediaSenders()
        self.configureAudioSession()
        self.peerConnection.delegate = self

Offer SDP

最初に送るSDP。
このとき、映像コーデック:H264, VP8 使えますなどが送信される。

    func offer(completion: @escaping (_ sdp: RTCSessionDescription) -> Void) {
        let constrains = RTCMediaConstraints(mandatoryConstraints: self.mediaConstrains,
                                             optionalConstraints: nil)
        self.peerConnection.offer(for: constrains) { (sdp, error) in
            guard let sdp = sdp else {
                return
            }
            
            self.peerConnection.setLocalDescription(sdp, completionHandler: { (error) in
                completion(sdp)
            })
        }
    }

Answer SDP

Offer SDPを受けて返すSDP。
映像コーデックを受けてVP8対応してます、などが送られる。

    func answer(completion: @escaping (_ sdp: RTCSessionDescription) -> Void)  {
        let constrains = RTCMediaConstraints(mandatoryConstraints: self.mediaConstrains,
                                             optionalConstraints: nil)
        self.peerConnection.answer(for: constrains) { (sdp, error) in
            guard let sdp = sdp else {
                return
            }
            
            self.peerConnection.setLocalDescription(sdp, completionHandler: { (error) in
                completion(sdp)
            })
        }
    }

映像キャプチャと送信開始

    func startCaptureLocalVideo(renderer: RTCVideoRenderer) {
        // 接続時に準備したカメラの映像を取得して映像フレームを生成するインスタンス
        guard let capturer = self.videoCapturer as? RTCCameraVideoCapturer else {
            return
        }

        // ビデオキャプチャ対応しているデバイス（フロントカメラ）を取得
        guard
            let frontCamera = (RTCCameraVideoCapturer.captureDevices().first { $0.position == .front }),
        
            // フロントカメラの最大解像度を取得
            let format = (RTCCameraVideoCapturer.supportedFormats(for: frontCamera).sorted { (f1, f2) -> Bool in
                let width1 = CMVideoFormatDescriptionGetDimensions(f1.formatDescription).width
                let width2 = CMVideoFormatDescriptionGetDimensions(f2.formatDescription).width
                return width1 < width2
            }).last,
        
            // 最大フレームレートを取得
            let fps = (format.videoSupportedFrameRateRanges.sorted { return $0.maxFrameRate < $1.maxFrameRate }.last) else {
            return
        }

        // フロントカメラのキャプチャ開始、相手への送信開始
        capturer.startCapture(with: frontCamera,
                              format: format,
                              fps: Int(fps.maxFrameRate))
        
        // 自分の映像を描画
        // localVideoTrackは接続時に準備しているローカル側の映像トラックなので、localVideoTrackに入ってくるデータをrendererに流し込むための設定
        self.localVideoTrack?.add(renderer)
    }

映像レンダーセット

    func renderRemoteVideo(to renderer: RTCVideoRenderer) {
        // 相手の映像を描画
        // remoteVideoTrackは接続時に準備している相手側からの受信映像トラックなので、remoteVideoTrackに入ってくるデータをrendererに流し込むための設定
        self.remoteVideoTrack?.add(renderer)
    }

映像に画像を表示したいとき

下記のようなやり方がある。

自分が送る映像フレームを画像にするパターン
DataChannelを利用するパターン

自分が送る映像フレームを画像にするパターン

基本的には、GoogleWebRTCは、映像はフロント/バックカメラの映像を相手に流すふうに実装されている。
だが、相手に流す映像がフレームごとにフックするためのdelegate(RTCVideoCapturerDelegate)が用意されているのでそれを利用する。
このときに、RTCVideoSource#capturer()で、RTCVideoFrameを流し込む。
実装は後述。ここはサンプルでは無くて、独自実装。

DataChannelを利用するパターン

データチャネルでは、RTCDataBuffer（バイナリデータを扱うクラス）の送受信が行えるので、それをお利用
このデータを送られたら画像表示するとかを、送信側と受信側で仕様を決めて、処理する。

自分が送る映像フレームを画像にする

もともとこういう感じになっていたのを..

self.videoSource = WebRTCClient.factory.videoSource()
self.videoCapturer = RTCCameraVideoCapturer(delegate: self.videoSource)

こういう感じにする。


self.videoSource = WebRTCClient.factory.videoSource()

//self.videoCapturer = RTCCameraVideoCapturer(delegate: self.videoSource)
self.videoCapturer = RTCCameraVideoCapturer(delegate: self) //selfはRTCVideoCapturerDelegateを継承

// WebRTCClient WebRTCの処理を行うためのクラス
extension WebRTCClient: RTCVideoCapturerDelegate {
    // これが映像フレーム送信前にコールバックされる。
    func capturer(_ capturer: RTCVideoCapturer, didCapture frame: RTCVideoFrame) {
        self.captureVideoFrameChannel(videoSource: self.videoSource, videoCapturer: capturer, srcframe: frame)
    }
}

画像からフレームを作成して送信

    private func captureVideoFrameChannel(videoSource: RTCVideoSource, videoCapturer: RTCVideoCapturer) {
        let image = UIImage(named: "gundam")!

        func cvPixelBuffer(image: UIImage) -> CVPixelBuffer?
        {
            let width = image.cgImage!.width
            let height = image.cgImage!.height
            let options: [NSObject: Any] = [
                            kCVPixelBufferCGImageCompatibilityKey: true,
                            kCVPixelBufferCGBitmapContextCompatibilityKey: true,
                            ]
            var pxbufferTemp: CVPixelBuffer? = nil
            let status = CVPixelBufferCreate(kCFAllocatorDefault, width,
                                             height, kCVPixelFormatType_32ARGB, options as CFDictionary,
                &pxbufferTemp);
            guard let pxbuffer = pxbufferTemp, status == kCVReturnSuccess else {
                fatalError()
            }

            CVPixelBufferLockBaseAddress(pxbuffer, [])
            let pxdataTmp = CVPixelBufferGetBaseAddress(pxbuffer)
            guard let pxdata = pxdataTmp else {
                fatalError()
            }

            let rgbColorSpace = CGColorSpaceCreateDeviceRGB();
            guard let  context = CGContext(data: pxdata, width: width,
                                           height: height, bitsPerComponent: 8, bytesPerRow: 4 * width, space: rgbColorSpace,
                                           bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) else {
                    fatalError()
            }
            context.draw(image.cgImage!, in: CGRect(origin: .zero, size: CGSize.init(width: width, height: height)))
            CVPixelBufferUnlockBaseAddress(pxbuffer, CVPixelBufferLockFlags(rawValue: 0))

            return pxbuffer
        }

        func cmSampleBuffer(image: UIImage) -> CMSampleBuffer {
            let pixelBuffer = cvPixelBuffer(image: image)
            var newSampleBuffer: CMSampleBuffer? = nil
            var timimgInfo: CMSampleTimingInfo = CMSampleTimingInfo.invalid
            var videoInfo: CMVideoFormatDescription? = nil
            CMVideoFormatDescriptionCreateForImageBuffer(allocator: nil, imageBuffer: pixelBuffer!, formatDescriptionOut: &videoInfo)
            CMSampleBufferCreateForImageBuffer(allocator: kCFAllocatorDefault, imageBuffer: pixelBuffer!, dataReady: true, makeDataReadyCallback: nil, refcon: nil, formatDescription: videoInfo!, sampleTiming: &timimgInfo, sampleBufferOut: &newSampleBuffer)
            return newSampleBuffer!
        }

        let pixelBuffer = CMSampleBufferGetImageBuffer(cmSampleBuffer(image: image))!
        let rtcpixelBuffer = RTCCVPixelBuffer(pixelBuffer: pixelBuffer)
        // これが映像フレームのデータ
        let videoFrame = RTCVideoFrame(
            buffer: rtcpixelBuffer,
            rotation: RTCVideoRotation._0,
            timeStampNs: Int64(Date().timeIntervalSince1970 * 1_000_000_000)
        )
        // ここで映像データ送信！
        videoSource.capturer(videoCapturer, didCapture: videoFrame)
    }

画像からフレームを作成して送信2

    private func captureVideoFrameChannel(videoSource: RTCVideoSource, videoCapturer: RTCVideoCapturer) {
        // ランダムに画像を切明
        let image = Bool.random() ? UIImage(named: "gundam")! : UIImage(named: "gp01")!

        func cvPixelBuffer(image: UIImage) -> CVPixelBuffer?
        {
            let width = image.cgImage!.width
            let height = image.cgImage!.height
            let options: [NSObject: Any] = [
                            kCVPixelBufferCGImageCompatibilityKey: true,
                            kCVPixelBufferCGBitmapContextCompatibilityKey: true,
                            ]
            var pxbufferTemp: CVPixelBuffer? = nil
            let status = CVPixelBufferCreate(kCFAllocatorDefault, width,
                                             height, kCVPixelFormatType_32ARGB, options as CFDictionary,
                &pxbufferTemp);
            guard let pxbuffer = pxbufferTemp, status == kCVReturnSuccess else {
                fatalError()
            }

            CVPixelBufferLockBaseAddress(pxbuffer, [])
            let pxdataTmp = CVPixelBufferGetBaseAddress(pxbuffer)
            guard let pxdata = pxdataTmp else {
                fatalError()
            }

            let rgbColorSpace = CGColorSpaceCreateDeviceRGB();
            guard let  context = CGContext(data: pxdata, width: width,
                                           height: height, bitsPerComponent: 8, bytesPerRow: 4 * width, space: rgbColorSpace,
                                           bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) else {
                    fatalError()
            }
            context.draw(image.cgImage!, in: CGRect(origin: .zero, size: CGSize.init(width: width, height: height)))
            CVPixelBufferUnlockBaseAddress(pxbuffer, CVPixelBufferLockFlags(rawValue: 0))

            return pxbuffer
        }

        func cmSampleBuffer(image: UIImage) -> CMSampleBuffer {
            let pixelBuffer = cvPixelBuffer(image: image)
            var newSampleBuffer: CMSampleBuffer? = nil
            var timimgInfo: CMSampleTimingInfo = CMSampleTimingInfo.invalid
            var videoInfo: CMVideoFormatDescription? = nil
            CMVideoFormatDescriptionCreateForImageBuffer(allocator: nil, imageBuffer: pixelBuffer!, formatDescriptionOut: &videoInfo)
            CMSampleBufferCreateForImageBuffer(allocator: kCFAllocatorDefault, imageBuffer: pixelBuffer!, dataReady: true, makeDataReadyCallback: nil, refcon: nil, formatDescription: videoInfo!, sampleTiming: &timimgInfo, sampleBufferOut: &newSampleBuffer)
            return newSampleBuffer!
        }

        let pixelBuffer = CMSampleBufferGetImageBuffer(cmSampleBuffer(image: image))!
        let rtcpixelBuffer = RTCCVPixelBuffer(pixelBuffer: pixelBuffer)
        let videoFrame = RTCVideoFrame(
            buffer: rtcpixelBuffer,
            rotation: RTCVideoRotation._0,
            timeStampNs: Int64(Date().timeIntervalSince1970 * 1_000_000_000)
        )
        videoSource.capturer(videoCapturer, didCapture: videoFrame)
    }

映像受信側の映像

送信側は、映像フレーム送信時に、timestampも送っていて、受信側はそのタイムスタンプに沿ってレンダリングしている（多分）。
だからRTCVideoFrameのtimestampが間違っていると、受信側で表示されない。

最後に

多分、まだiOSアプリで簡単に使えるクライアント用ライブラリは無い（と思われる）
サーバー含めたサービスはいろいろあるので、それを使えば割と簡単にできる。
音声ももうちょっと実装見ていきたかった..

今回利用したソース

reference(多謝!)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up