More than 5 years have passed since last update.

Fortran 2003/08 による mpeg audio layer 1 encoder

Last updated at 2017-10-31Posted at 2017-03-07

Fortran 2003/08 による mpeg audio layer 1 encoder

大昔に Fortran 90/95 で書いた mpeg-1 audio layer 1 のエンコーダを、Fortran 2003/08 の勉強のために書き直してみました。

おおむね Fortran 2003 ですが、微妙に Fortran 2008 の命令も使っています。簡単な
オブジェクト指向の機能を使ってみましたが、セッター・ゲッターを書くのがだるいので、限定的です。

コンパイラは intel fortran ver.17.02 を用いました。コンパイル時のオプションとして「F2003セマンティクスを有効にする」必要があります。そうしないとコンパイルは出来ますが、実行時エラーが起きます。~~gfortran ではまだチェックしてません。~~(gfortran に無いParameterized derived types などの使用は避けました。)

gfortran ver.5.4.0 でも動作確認できました。

gfortran -std=f2008 kind.f90 mpg.f90 crc.f90 mpg_io.f90 wav_io.f90 filter.f90 psycho.f90 layer1.f90 uzura1win.f90

mpeg audio layer 1 (mp1) とは

mpeg とは、Moving Picture Expert Group の略号で、mpeg-1 は 1992 年に制定された音声付き動画の ISO 国際規格を指します。その動画規格のうちの音声記録部分のみを指したものが、audio layer となります。音声規格には３つの種類があり、それぞれ Layer I, II, III とローマ数字の番号で区別され、番号が増えるほど複雑で高圧縮率になります。Layer I はデジタル・コンパクト・カセットに使われた規格で、CD の 1/4 ほどの 384kbps の圧縮率に適するようになっています。

mp1 は今もよく使われる mp3 に比べてフォーマットなどが極めて単純で素直な形式になっているので、プログラムとして実装しやすくなっています。

参考文献:

D. Pan, "A Tutorial on MPEG/Audio Compression"
https://www.ee.columbia.edu/~dpwe/papers/Pan95-mpeg-ieeemm.pdf

dist10: ISO による参照実装
dist10.tgz で検索してください。

Fortran 2003/08 による実装

Fortran 2003 は Fortran 90/95 にオブジェクト指向 (OO:Object Oriented) の要素を取り入れたもので、それに伴って動的な機能が増えています。Fortran 2008 は、coarray の機能を除けば、Fortran 2003 の不便なところを修正した規格になっています。ここでは主に Fortran 2003 規格にのっとってプログラムを書きましたが、Fortran 2008 で微修正された機能も取り入れてゆくことにしました。

以下では個々のプログラムについて、mpeg 面と Fortran 面の両方について簡単にふれてゆこうかと考えています。

ソース・プログラム

ソースは github に上げましたが、色々おかしいです？メールアドレス設定が変になっていたせい？
https://github.com/cure-honey/uzura1

命名法その他
モジュール名は m_ 、派生型は t_ 、変数名は暗黙型と同じ接頭文字から始めるようにしました。最近の流行ではモジュール名や派生型はそれぞれ _m、_t の接尾文字をつけ、変数名の暗黙型には捕らわれないのが主流のようですが、昔に書いたものなのでこうなっています。

メインプログラムの stop 文、サブルーチンや関数の return 文は、末尾に来る場合に省略しました。private や public の宣言は面倒でもいちいちすることにしました。副プログラムの pure/elemental 等の属性については、気まぐれに付けました。impure は使わないことにしました。OO がらみのクラス変数はスカラーでも動的に割り付けるようにしました。

データ隠匿には無理にこだわらず、セッター/ゲッターを書くのが面倒な時には直接読み書きをすることにしました。陰徳有れば陽報ありといえども、尾生の信じゃあるまいに OO に義理もないので。

配列の多くを規格書に則って 0 から始まるように書き直しましたが、色々失敗だった気がします。理由は（忘れなければｗ）後述します。

kind.f90

module m_kind

mpeg の規格では有効数字９桁が要求されてるので、倍精度実数を用いる必要があります。

    module m_kind
      use, intrinsic :: iso_fortran_env
      public
      integer, parameter :: kd = real64
    end module m_kind

Fortran 90 規格制定の時点では、計算機の語長、バイト長、浮動小数点数のフォーマットなどがベンダー依存で統一されていなかったため、数値の型の指定をする kind が何を意味するかは規格では定められませんでした。

最近は 1語=32bit/64bit、1byte=8bit、数値フォーマットは IEEE754 で統一された感があるので、Fortran 2008 では iso_fortran_env というお仕着せのモジュールに変数の型が用意されました。

module m_file_io

mpeg audio encoder は、wav 形式の音声ファイルを入力とし、mpeg 形式の音声ファイルを出力とするので、二つの形式のファイルの I/O が必要となります。

    module m_file_io ! abstract interface for file 
      implicit none
      type, abstract :: t_file
      contains  
        procedure(p_open ), deferred :: open_file
        procedure(p_close), deferred :: close_file
      end type t_file   
      
      abstract interface
        subroutine p_open(this, fn)
          import 
          class(t_file), intent(in out) :: this
          character(len = *), intent(in) :: fn
        end subroutine p_open
      
        subroutine p_close(this)
          import
          class(t_file), intent(in) :: this
        end subroutine p_close
      end interface
    end module m_file_io

ここでは open/close という共通動作を、抽象インターフェースとそれをメソッドとして含む抽象派生型としてまとめてみました。こうすることで、そのインターフェースを持つルーチンの実装が強制されます。

普通のクラスは、はじめに変数を束ねた派生型があって、それを引数とする副プログラムを従属させるという発想だと思いますが、抽象クラスは、はじめに共通動作が束ねてあって、それに変数が従属するという発想なのではないかと思います。そういうわけで、ここでの使い方に適しているように思えたのですが、別に定義した mpg 派生型を合体させることが出来ず、抽象クラスのご利益がよく分からぬままでした。

抽象インターフェースだけ module で受け渡せばいいのかもしれません。

mpg.f90

m_mpg

mpeg/audio ファイルはフレームという独立した単位の集合から成り、１フレームあたり 10ミリ秒程度相当の音声データを持ちます。各フレームはヘッダとボディから成ります。そのヘッダ部分の情報と、それが意味する内容を一括して定義しておきます。

    module m_mpg
      use m_kind
      implicit none
      public
      type :: t_mpg
        integer :: mtype           =  3 ! 0:mpeg2.5, 1:---, 2:mpeg2, 3:mpeg1
        integer :: layer           =  3 ! layer { 1 = III, 2 = II, 3 = I }
        integer :: ibit_rate       = 12 ! 192 * 2 = 384kbps 
        integer :: isample_rate         ! 0:44.1kHz 1:48kHz 2:32kHz 3:reserved
        integer :: ipsychoacoustic =  0
        integer :: iemphasis       =  0
        integer :: ipadding        =  0 
        integer :: icrc            =  1 ! CRC16  0 enabled / 1 disabled
        integer :: mode            =  0 ! 0:stereo 1:joint_stereo 2:dual_channel 3:single_channel
        integer :: iprivate_bit    =  0 ! unused 
        integer :: mode_extension  =  0 ! 0:4-31 1:8-31 2:12-31 3:16-31 intensity stereo
        integer :: icopyright      =  0 ! 0:     1:copyright protected
        integer :: ioriginal       =  1 ! 0:copy 1:original
      end type t_mpg
      !mpeg1 / audio
      integer, parameter :: mpeg_frame_size(3)      = [1152, 1152, 384]
      integer, parameter :: mpeg_sample_rates(0:3)  = [44100, 48000, 32000, 0]
      integer, parameter :: mpeg_bit_rates(0:14, 3) = &
               reshape( [ 0, 32, 40, 48,  56,  64,  80,  96, 112, 128, 160, 192, 224, 256, 320,   &
                          0, 32, 48, 56,  64,  80,  96, 112, 128, 160, 192, 224, 256, 320, 384,   &
                          0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 414, 448 ], &
                        [15, 3] )
      character (len = *), parameter :: mpeg_mode_names(0:*)    = [character(len = 8)::'stereo', 'j-stereo', 'dual-ch', 'mono']
      character (len = *), parameter :: mpeg_layer_names(0:*)   = [character(len = 3)::'', 'III', 'II', 'I']
      character (len = *), parameter :: mpeg_version_names(0:*) = [character(len = 7)::'MPEG2.5', '', 'MPEG-II', 'MPEG-I']
      character (len = *), parameter :: mpeg_psy_names(0:*)     = [character(len = 6)::'', '', '', '']
      character (len = *), parameter :: mpeg_demp_names(0:*)    = [character(len = 7)::'none', '50/15us', '', 'CITT']
    end module m_mpg

Fortran 2008 では、parameter の数を * 指定で自動で数えてくれます。また 2003 の配列構成子を使うことにより、文字列長を自動でそろえてくれます。parameter は文字列長を * 指定で自動で数えてくれます。

crc.f90

m_crc

mpeg では、CRC16 を埋め込むことによって、情報のうち重要度の高い部分の伝送エラーをチェックすることができます。ただし、エラーの復元のような機能は持ちません。ほとんどのプレーヤーは CRC16 の検出をしません。検出するプレーヤもそのフレームの音を消すだけです。元々はストリーミング通信などの場合の再送信要求などを想定したものと思われます。mpeg では crc の初期値は Z'FFFF' 生成多項式は x^16+x^15+x^2+1 = Z'8005' と定義されています。

    module m_crc
      use :: m_kind
      implicit none
      private 
      public :: crc16
      integer, parameter :: igenerator =  32773  !Z'8005' ! 32773  !B'1000000000000101'  ! x^16+x^15+x^2+1
    contains
      subroutine crc16(n, inp, icrc)
        integer, intent(in    ) :: n, inp
        integer, intent(in out) :: icrc
        integer :: i, ibit1, ibit2
        do i = n - 1, 0, -1
          ibit1 = ibits(inp,   i, 1)
          ibit2 = ibits(icrc, 15, 1)
          icrc  = iand(int(Z'0000FFFF'), ishft(icrc, 1)) ! trim to 16bit  
          if ( ieor(ibit1, ibit2) == 1 ) icrc = ieor(icrc, igenerator)
        end do
      end subroutine crc16  
    end module m_crc

CRC は九去法のように剰余類を用いて誤りを検出するものです。Numerical Recipes 第二版の "less-numerical algorithms" の章に説明があります。CRC はフレームごとに初期値から累積してゆかなければなりません。つまり履歴に依存する状態を持ちます。ここでは昔のプログラムを bit 演算に書きかえただけにしましたが、ついでに OO により内部状態を持つクラスにした方が良かったかもしれません。

mpg_io.f90

m_mpg_io

mp1 ファイルはフレームのビット列が順に連続的に書き出された集合体になっています。フレームのサイズはビットレートから決まります。

フレームのフォーマットはヘッダ・[CRC16]・ボディとなっており、CRC16 をつけるかどうかは任意です。付けることによりデータ用のボディ領域が 16 ビット分減ります。ボディはビット割り付け情報・スケール係数・割り付けビット列の３部分からなります。

各フレーム末尾には、使われずに余った半端なビットの領域があっても構いません。

    module m_mpg_io
      use m_kind
      use m_file_io
      use m_mpg
      use m_crc
      implicit none
      private 
      public :: t_mpgfile
      type, extends(t_file) :: t_mpgfile
        private
        integer :: iunit 
        integer :: ipos
        character(:), allocatable :: fn
        character (len = 20000) :: bit_string
      contains
        procedure :: open_file  => open_mpg_file
        procedure :: close_file => close_mpg_file
        final     :: destroy_file
        procedure :: write_bits_1frame
        procedure :: clear_bit_buff
        procedure :: encode_header
        procedure :: encode_crc
        procedure :: encode_body
        procedure :: put_bits
        procedure :: put_bits_c
      end type t_mpgfile    
    contains
      subroutine open_mpg_file(this, fn)
        class(t_mpgfile),   intent(in out) :: this
        character(len = *), intent(in) :: fn
        integer :: io
        open(newunit = this%iunit, file = fn, iostat = io, status = 'unknown', access = 'stream')
        if (io /= 0) then
          write(*, *) 'i/o error ', io, ' occuerred. file =', this%iunit, 'file name ', fn
          stop 'check output file!'
        end if
      end subroutine open_mpg_file
      
      subroutine close_mpg_file(this)
        class(t_mpgfile), intent(in) :: this
        close(this%iunit)
      end subroutine close_mpg_file
      
      subroutine destroy_file(this)
        type(t_mpgfile), intent(in) :: this
        call this%close_file()
      end subroutine destroy_file  
     
      subroutine write_bits_1frame(this, n)
        class(t_mpgfile), intent(in) :: this
        integer, intent(in) :: n
        integer :: i, j, ipos
        integer(int8) :: m
        ipos = 0
        do i = 1, n, 8
          m = 0
          do j = 7, 0, -1
            ipos = ipos + 1
            if (ipos > len(this%bit_string)) exit
            if (this%bit_string(ipos:ipos) == '1')  m = ibset(m, j) 
          end do
          write(this%iunit) m 
        end do
      end subroutine write_bits_1frame
    
      subroutine clear_bit_buff(this, n)
        class(t_mpgfile), intent(in out) :: this
        integer, intent(in) :: n
        this%bit_string = repeat(' ', n)
      end subroutine clear_bit_buff
    
      subroutine encode_header(this, mpg)
        class(t_mpgfile), intent(in out) :: this
        type (t_mpg)    , intent(in    ) :: mpg
        this%ipos = 1 ! reset position to the first bit
        call this%put_bits_c('11111111111'      )  !sync word
        call this%put_bits(2, mpg%mtype         )  !mpeg1
        call this%put_bits(2, mpg%layer         )  !layer 1
        call this%put_bits(1, mpg%icrc          )  !crc check no
        call this%put_bits(4, mpg%ibit_rate     )  !bitrate 
        call this%put_bits(2, mpg%isample_rate  )  !sampling frequency 44.1
        call this%put_bits(1, mpg%ipadding      )  !ipadding
        call this%put_bits(1, mpg%iprivate_bit  )  !private bit : unused
        call this%put_bits(2, mpg%mode          )  !stereo
        call this%put_bits(2, mpg%mode_extension)  !mode
        call this%put_bits(1, mpg%icopyright    )
        call this%put_bits(1, mpg%ioriginal     )
        call this%put_bits(2, mpg%iemphasis     )
      end subroutine encode_header
   
      subroutine encode_crc(this, mpg, ialloc_bits)
        class(t_mpgfile), intent(in out) :: this
        type (t_mpg)    , intent(in    ) :: mpg
        integer         , intent(in    ) :: ialloc_bits(:, :)
        integer :: iband, ichannel, icrc
        icrc = int(Z'0000FFFF') ! initialize crc 
        call crc16(4, mpg%ibit_rate     , icrc)
        call crc16(2, mpg%isample_rate  , icrc)
        call crc16(1, mpg%ipadding      , icrc)
        call crc16(1, mpg%iprivate_bit  , icrc)
        call crc16(2, mpg%mode          , icrc)
        call crc16(2, mpg%mode_extension, icrc)
        call crc16(1, mpg%icopyright    , icrc)
        call crc16(1, mpg%ioriginal     , icrc)
        call crc16(2, mpg%iemphasis     , icrc)
        do iband = 1, 32
          do ichannel = 1, size(ialloc_bits, 2)
            call crc16(4, ialloc_bits(iband, ichannel), icrc)
          end do
        end do
        call this%put_bits(16, icrc)
      end subroutine encode_crc
 
      subroutine encode_body(this, ialloc_bits, iscale_factor, isubband)
        class(t_mpgfile), intent(in out) :: this
        integer, intent(in) :: ialloc_bits(:, :), iscale_factor(:, :), isubband(:, :, :)
        integer :: iband, ichannel, j, k
        do iband = 1, 32
          do ichannel = 1, size(ialloc_bits, 2)
            call this%put_bits(4, ialloc_bits(iband, ichannel) )
          end do
        end do
        do iband = 1, 32
          do ichannel = 1, size(ialloc_bits, 2)
            if (ialloc_bits(iband, ichannel) /= 0) call this%put_bits(6, iscale_factor(iband, ichannel) )
          end do
        end do
        do j = 1, 12
          do iband = 1, 32
            do ichannel = 1, size(ialloc_bits, 2)
              k = ialloc_bits(iband, ichannel)
              if (k /= 0) call this%put_bits(k + 1, isubband(iband, j, ichannel) )
            end do
          end do
        end do
      end subroutine encode_body
 
      subroutine put_bits(this, n, inp)
        class(t_mpgfile), intent(in out) :: this
        integer, intent(in) :: n, inp
        integer :: i, m
        associate (ipos => this%ipos, bit_string => this%bit_string)
          do i = 1, n
            m = 2**(n - i)
            if (mod(inp / m, 2) == 1) then
              this%bit_string(ipos:ipos) = '1'
            else
              this%bit_string(ipos:ipos) = '0'
            end if
            ipos = ipos + 1
          end do
        end associate  
      end subroutine put_bits
 
      subroutine put_bits_c(this, str)
        class(t_mpgfile), intent(in out) :: this
        character(len = *), intent(in) :: str
        integer :: i
        associate (ipos => this%ipos, bit_string => this%bit_string)
          do i = 1, len_trim(str)
            if (str(i:i) /= '0' .and. str(i:i) /= '1') stop 'invalid string: subroutine put_bit_c'
            this%bit_string(ipos:ipos) = str(i:i)
            ipos = ipos + 1
          end do
        end associate      
      end subroutine put_bits_c
    end module m_mpg_io

mp1 ファイル I/O には、先に定義した抽象クラスを継承したクラスを用いました。
Fortran のオブジェクトにはコンストラクタルーチンを定義できないので、代入以外の初期化処理は別に初期化ルーチンを用意して呼び出す必要があります。ファイナライザ（デストラクタ）はあるので、ファイルのクローズをさせています。

ビット列は０，１からなる文字列として作り上げてゆき、最後にビット列として書き出します。

ファイルをオープンするときのファイル番号は、Fortran 2008 の newunit によって、処理系側にファイル番号を用意させています。これでファイル番号の選択に頭を使う必要がなくなります。バイナリデータは stream モードを使うことで書いたままのデータを垂れ流しで出力できます。

wav_io.f90

入力となる wav ファイルは MS-RIFF (Microsoft Resource Interchange File Format) と呼ばれる構造の一種をしています。これは chunk（塊）から構成されます。chunk は、文字 4 バイトの chunk_id 、4 バイトリトルエンディアン整数の chunk_size、そして chunk_size のバイト数のデータから成ります。

特に wav ファイルは、'RIFF' chunk からなります。 'RIFF' chunk は 'fmt ' chunk, 'data' chunk の２つの chunk から構成されます。（このほかに付加的な情報の chunk が加わることがあります。）

'fmt ' chunk にはチャンネル数、サンプリングレート等の情報が入っています。

'data' chunk の中には PCM 音声データが入っています。CD からリッピングした音声データの場合　サンプリングレート 44.1kHz で符号付 16ビット・リトルエンディアン整数列がステレオ 2ch (L,R順) で並んでいます。

    module m_wav
      use m_kind
      use m_file_io
      implicit none
      private
      public :: t_wavfile, t_fmt
      type :: t_fmt
        sequence
        character(4):: chunk_id
        integer(int32) :: chunk_size
        integer(int16) :: format_id, channels
        integer(int32) :: sampling_rate
        integer(int32) :: bytes_per_sec
        integer(int16) :: block_size, bits_per_sample
      end type t_fmt

      type :: t_data
        sequence
        character(4) :: chunk_id
        integer(int32)  :: chunk_size
        !! integer(int16), allocatable :: pcm16(:)
      end type t_data

      type :: t_riffwav
        sequence
        character(4)  :: chunk_id
        integer(int32):: chunk_size
        character(4)  :: formattag
        type (t_fmt ) :: fmt
        type (t_data) :: dat
      end type t_riffwav

      type, extends(t_file) :: t_wavfile
        private
        integer :: iunit 
        integer :: ipos
        character(:), allocatable :: fn
        type (t_riffwav) :: riff
        integer(int16), allocatable :: pcm16(:) ! <- data of data_chunk
      contains
        procedure, public :: open_file  => open_wav_file
        procedure, public :: close_file => close_wav_file
        final             :: destroy_file
        procedure, public :: pcm1frame
        procedure, public :: get_channel
        procedure, public :: get_sampling_rate
        procedure, public :: get_data_size
        procedure, public :: get_play_time
      end type t_wavfile
    contains
      subroutine open_wav_file(this, fn)
        class(t_wavfile), intent(in out) :: this
        character(len = *), intent(in) :: fn
        integer :: io
        this%fn = fn
        open(newunit = this%iunit, file = trim(this%fn), access = 'stream', iostat = io, status = 'old', form = 'unformatted')
        if (io /= 0) then
          write(*, *) 'i/o error ', io, ' occuerred. file =', this%iunit, 'file name ', fn
          stop 'check output file!'
        end if
        ! read wave file little endian assumed
        associate (riff => this%riff, fmt => this%riff%fmt, dat => this%riff%dat)  
          ! riff-wave chunk
          read(this%iunit) riff 
          if ( riff%chunk_id  /= 'RIFF' ) stop 'this is not RIFF file'
          if ( riff%formattag /= 'WAVE' ) stop 'this RIFF file is not in WAVE format'
          ! fmt chunk
          if ( fmt%chunk_id   /= 'fmt ' ) stop 'fmt chunk not found'
          if ( fmt%format_id  /=  1     ) stop 'unknown wave format' ! 1 linear pcm
          if ( fmt%bits_per_sample /= 16) stop 'not 16bit data'
          select case ( fmt%channels )
            case (1)
              write(*, '(a, i3, a, i6, a)') 'monoral', fmt%bits_per_sample, 'bit sampling rate', fmt%sampling_rate, 'hz '
            case (2)
              write(*, '(a, i3, a, i6, a)') 'stereo' , fmt%bits_per_sample, 'bit sampling rate', fmt%sampling_rate, 'hz '
            case default
              stop 'wave channels must be 1 or 2'
          end select
          ! data chunk
          if (dat%chunk_id /= 'data') then
            do      
              inquire(this%iunit, pos = this%ipos)
              this%ipos = this%ipos + dat%chunk_size  ! skip non-data chunk
              read(this%iunit, pos = this%ipos, iostat = io) dat
              if (io == -1) stop 'end of file encounterd while searching for a data chunk'
              if (dat%chunk_id == 'data') exit
            end do
          end if
          ! now file position is at the beginning of pcm data
          allocate( this%pcm16( dat%chunk_size / 2 ) )
          ! read whole pcm data
          read(this%iunit) this%pcm16 
        end associate
        return
      end subroutine open_wav_file 

      subroutine close_wav_file(this)           
        class(t_wavfile), intent(in) :: this
        close(this%iunit)
      end subroutine close_wav_file
  
      subroutine destroy_file(this) ! finalization routine
        type(t_wavfile), intent(in) :: this
        call this%close_file()
      end subroutine destroy_file  
  
      subroutine pcm1frame(this, pcm) ! copy from array
        class(t_wavfile), intent(in) :: this
        real(kd), intent(in out) :: pcm(:, :)
        integer, save :: ipos = 1
        pcm = eoshift(pcm, -384, 0.0_kd, 1)
        select case (this%riff%fmt%channels)
          case (1) !mono
            pcm(384:1:-1, 1) = real( this%pcm16(ipos:ipos + 384 - 1), kind = kd ) / 2**15 
            ipos = ipos + 384
          case (2) !stereo
            pcm(384:1:-1, 1) = real( this%pcm16(ipos    :ipos + 2 * 384 - 1:2), kind = kd ) / 2**15 ! L 16bit int
            pcm(384:1:-1, 2) = real( this%pcm16(ipos + 1:ipos + 2 * 384 - 1:2), kind = kd ) / 2**15 ! R
            ipos = ipos + 2 * 384
          case default
            stop 'ichannel must be 1 or 2: subroutine wav_get' 
        end select
      end subroutine pcm1frame
     ! getters
      integer function get_channel(this)
        class(t_wavfile), intent(in) :: this
        get_channel = this%riff%fmt%channels ! mono:1 stereo:2
      end function get_channel
 
      integer function get_sampling_rate(this)
        class(t_wavfile), intent(in) :: this
        get_sampling_rate = this%riff%fmt%sampling_rate ! in Hz
      end function get_sampling_rate
 
      integer function get_data_size(this)
        class(t_wavfile), intent(in) :: this
        get_data_size = this%riff%dat%chunk_size ! in bytes
      end function get_data_size
    
      integer function get_play_time(this) result(isec)
        class(t_wavfile), intent(in) :: this
        isec = this%riff%dat%chunk_size / this%riff%fmt%bytes_per_sec ! play time in sec
      end function get_play_time
    end module m_wav

wav ファイルを読み込む派生型は、抽象クラス t_file を継承したものとしました。riff wave 形式のchunk 構造はそのまま派生型として定義通りに並べました。sequence 属性をつけることで定義順にメモリーマップされます。こうすることで派生型の読み込みでファイルから chunk 構造を一気に読み取れます。しかし、sequence 属性を持つ派生型は継承できなくなります。

wav ファイルは一気にすべてのデータを配列内に読み込みます。本来は mpeg １フレーム分に必要なだけ読み取りを繰り返せば十分です。chunk 構造をそのまま派生型にするためにこのようにしました。メモリーの無駄遣いですが単純化されます。またリトルエンディアンも仮定しました。Intel 系ならば問題ありませんが、IBM や SPARC はビッグエンディアンなので若干問題があります。最近はエンディアンをコンパイル時に選べる場合もありますが、16bit のデータでもちゃんと働くのかよく分かりません。

filter.f90

m_polyphase

mpeg/audio では、まず時系列データである wav ファイルの pcm データをバンドパス・フィルターを使って周波数領域にうつします。バンドパス・フィルターはプロトタイプ・フィルターが一つ数値的に定義されています。そしてこのプロトタイプを周波数領域でずらして（時間領域で三角関数を掛けて）各周波数領域ごとのバンドパス・フィルターを得ます。これを polyphase filter bank と呼びます。

このバンドパス・フィルターを掛けることで、入力された mpeg １フレーム分の pcm データは周波数方向に 32 個のサブバンドに、時間方向に 12 個に切り分けられたデータとなって出力されます。

     module m_polyphase
       use m_kind
       implicit none
       private
       public :: t_subband
       real(kd), parameter :: pi = 4 * atan(1.0_kd), pi64 = pi / 64
       integer :: i, j
       ! ISO section C.1.3 Analysis subband filter    (rounding at 9th decimal) 
       real(kd), parameter :: coeff(0:31, 0:63) = &   
       reshape([((1.0e-9_kd * anint(1.0e9_kd * cos(mod( (2 * i + 1) * (j - 16), 128) * pi64)), i = 0, 31), j = 0, 63)], [32, 64])
       ! ISO Table C.1 -- Coefficients Ci of the Analysis Window 
       real(kd), parameter :: prototype(0:511) = &         
[0.000000000_kd,-0.000000477_kd,-0.000000477_kd,-0.000000477_kd,-0.000000477_kd,-0.000000477_kd,-0.000000477_kd,-0.000000954_kd, & 
-0.000000954_kd,-0.000000954_kd,-0.000000954_kd,-0.000001431_kd,-0.000001431_kd,-0.000001907_kd,-0.000001907_kd,-0.000002384_kd, & 
-0.000002384_kd,-0.000002861_kd,-0.000003338_kd,-0.000003338_kd,-0.000003815_kd,-0.000004292_kd,-0.000004768_kd,-0.000005245_kd, & 
-0.000006199_kd,-0.000006676_kd,-0.000007629_kd,-0.000008106_kd,-0.000009060_kd,-0.000010014_kd,-0.000011444_kd,-0.000012398_kd, & 
-0.000013828_kd,-0.000014782_kd,-0.000016689_kd,-0.000018120_kd,-0.000019550_kd,-0.000021458_kd,-0.000023365_kd,-0.000025272_kd, & 
-0.000027657_kd,-0.000030041_kd,-0.000032425_kd,-0.000034809_kd,-0.000037670_kd,-0.000040531_kd,-0.000043392_kd,-0.000046253_kd, & 
-0.000049591_kd,-0.000052929_kd,-0.000055790_kd,-0.000059605_kd,-0.000062943_kd,-0.000066280_kd,-0.000070095_kd,-0.000073433_kd, & 
-0.000076771_kd,-0.000080585_kd,-0.000083923_kd,-0.000087261_kd,-0.000090599_kd,-0.000093460_kd,-0.000096321_kd,-0.000099182_kd, & 
 0.000101566_kd, 0.000103951_kd, 0.000105858_kd, 0.000107288_kd, 0.000108242_kd, 0.000108719_kd, 0.000108719_kd, 0.000108242_kd, & 
 0.000106812_kd, 0.000105381_kd, 0.000102520_kd, 0.000099182_kd, 0.000095367_kd, 0.000090122_kd, 0.000084400_kd, 0.000077724_kd, & 
 0.000069618_kd, 0.000060558_kd, 0.000050545_kd, 0.000039577_kd, 0.000027180_kd, 0.000013828_kd,-0.000000954_kd,-0.000017166_kd, & 
-0.000034332_kd,-0.000052929_kd,-0.000072956_kd,-0.000093937_kd,-0.000116348_kd,-0.000140190_kd,-0.000165462_kd,-0.000191212_kd, & 
-0.000218868_kd,-0.000247478_kd,-0.000277042_kd,-0.000307560_kd,-0.000339031_kd,-0.000371456_kd,-0.000404358_kd,-0.000438213_kd, & 
-0.000472546_kd,-0.000507355_kd,-0.000542164_kd,-0.000576973_kd,-0.000611782_kd,-0.000646591_kd,-0.000680923_kd,-0.000714302_kd, & 
-0.000747204_kd,-0.000779152_kd,-0.000809669_kd,-0.000838757_kd,-0.000866413_kd,-0.000891685_kd,-0.000915051_kd,-0.000935555_kd, & 
-0.000954151_kd,-0.000968933_kd,-0.000980854_kd,-0.000989437_kd,-0.000994205_kd,-0.000995159_kd,-0.000991821_kd,-0.000983715_kd, & 
 0.000971317_kd, 0.000953674_kd, 0.000930786_kd, 0.000902653_kd, 0.000868797_kd, 0.000829220_kd, 0.000783920_kd, 0.000731945_kd, & 
 0.000674248_kd, 0.000610352_kd, 0.000539303_kd, 0.000462532_kd, 0.000378609_kd, 0.000288486_kd, 0.000191689_kd, 0.000088215_kd, & 
-0.000021458_kd,-0.000137329_kd,-0.000259876_kd,-0.000388145_kd,-0.000522137_kd,-0.000661850_kd,-0.000806808_kd,-0.000956535_kd, & 
-0.001111031_kd,-0.001269817_kd,-0.001432419_kd,-0.001597881_kd,-0.001766682_kd,-0.001937389_kd,-0.002110004_kd,-0.002283096_kd, & 
-0.002457142_kd,-0.002630711_kd,-0.002803326_kd,-0.002974033_kd,-0.003141880_kd,-0.003306866_kd,-0.003467083_kd,-0.003622532_kd, & 
-0.003771782_kd,-0.003914356_kd,-0.004048824_kd,-0.004174709_kd,-0.004290581_kd,-0.004395962_kd,-0.004489899_kd,-0.004570484_kd, & 
-0.004638195_kd,-0.004691124_kd,-0.004728317_kd,-0.004748821_kd,-0.004752159_kd,-0.004737377_kd,-0.004703045_kd,-0.004649162_kd, & 
-0.004573822_kd,-0.004477024_kd,-0.004357815_kd,-0.004215240_kd,-0.004049301_kd,-0.003858566_kd,-0.003643036_kd,-0.003401756_kd, & 
 0.003134727_kd, 0.002841473_kd, 0.002521515_kd, 0.002174854_kd, 0.001800537_kd, 0.001399517_kd, 0.000971317_kd, 0.000515938_kd, & 
 0.000033379_kd,-0.000475883_kd,-0.001011848_kd,-0.001573563_kd,-0.002161503_kd,-0.002774239_kd,-0.003411293_kd,-0.004072189_kd, & 
-0.004756451_kd,-0.005462170_kd,-0.006189346_kd,-0.006937027_kd,-0.007703304_kd,-0.008487225_kd,-0.009287834_kd,-0.010103703_kd, & 
-0.010933399_kd,-0.011775017_kd,-0.012627602_kd,-0.013489246_kd,-0.014358521_kd,-0.015233517_kd,-0.016112804_kd,-0.016994476_kd, & 
-0.017876148_kd,-0.018756866_kd,-0.019634247_kd,-0.020506859_kd,-0.021372318_kd,-0.022228718_kd,-0.023074150_kd,-0.023907185_kd, & 
-0.024725437_kd,-0.025527000_kd,-0.026310921_kd,-0.027073860_kd,-0.027815342_kd,-0.028532982_kd,-0.029224873_kd,-0.029890060_kd, & 
-0.030526638_kd,-0.031132698_kd,-0.031706810_kd,-0.032248020_kd,-0.032754898_kd,-0.033225536_kd,-0.033659935_kd,-0.034055710_kd, & 
-0.034412861_kd,-0.034730434_kd,-0.035007000_kd,-0.035242081_kd,-0.035435200_kd,-0.035586357_kd,-0.035694122_kd,-0.035758972_kd, & 
 0.035780907_kd, 0.035758972_kd, 0.035694122_kd, 0.035586357_kd, 0.035435200_kd, 0.035242081_kd, 0.035007000_kd, 0.034730434_kd, & 
 0.034412861_kd, 0.034055710_kd, 0.033659935_kd, 0.033225536_kd, 0.032754898_kd, 0.032248020_kd, 0.031706810_kd, 0.031132698_kd, & 
 0.030526638_kd, 0.029890060_kd, 0.029224873_kd, 0.028532982_kd, 0.027815342_kd, 0.027073860_kd, 0.026310921_kd, 0.025527000_kd, & 
 0.024725437_kd, 0.023907185_kd, 0.023074150_kd, 0.022228718_kd, 0.021372318_kd, 0.020506859_kd, 0.019634247_kd, 0.018756866_kd, & 
 0.017876148_kd, 0.016994476_kd, 0.016112804_kd, 0.015233517_kd, 0.014358521_kd, 0.013489246_kd, 0.012627602_kd, 0.011775017_kd, & 
 0.010933399_kd, 0.010103703_kd, 0.009287834_kd, 0.008487225_kd, 0.007703304_kd, 0.006937027_kd, 0.006189346_kd, 0.005462170_kd, & 
 0.004756451_kd, 0.004072189_kd, 0.003411293_kd, 0.002774239_kd, 0.002161503_kd, 0.001573563_kd, 0.001011848_kd, 0.000475883_kd, & 
-0.000033379_kd,-0.000515938_kd,-0.000971317_kd,-0.001399517_kd,-0.001800537_kd,-0.002174854_kd,-0.002521515_kd,-0.002841473_kd, & 
 0.003134727_kd, 0.003401756_kd, 0.003643036_kd, 0.003858566_kd, 0.004049301_kd, 0.004215240_kd, 0.004357815_kd, 0.004477024_kd, & 
 0.004573822_kd, 0.004649162_kd, 0.004703045_kd, 0.004737377_kd, 0.004752159_kd, 0.004748821_kd, 0.004728317_kd, 0.004691124_kd, & 
 0.004638195_kd, 0.004570484_kd, 0.004489899_kd, 0.004395962_kd, 0.004290581_kd, 0.004174709_kd, 0.004048824_kd, 0.003914356_kd, & 
 0.003771782_kd, 0.003622532_kd, 0.003467083_kd, 0.003306866_kd, 0.003141880_kd, 0.002974033_kd, 0.002803326_kd, 0.002630711_kd, & 
 0.002457142_kd, 0.002283096_kd, 0.002110004_kd, 0.001937389_kd, 0.001766682_kd, 0.001597881_kd, 0.001432419_kd, 0.001269817_kd, & 
 0.001111031_kd, 0.000956535_kd, 0.000806808_kd, 0.000661850_kd, 0.000522137_kd, 0.000388145_kd, 0.000259876_kd, 0.000137329_kd, & 
 0.000021458_kd,-0.000088215_kd,-0.000191689_kd,-0.000288486_kd,-0.000378609_kd,-0.000462532_kd,-0.000539303_kd,-0.000610352_kd, & 
-0.000674248_kd,-0.000731945_kd,-0.000783920_kd,-0.000829220_kd,-0.000868797_kd,-0.000902653_kd,-0.000930786_kd,-0.000953674_kd, & 
 0.000971317_kd, 0.000983715_kd, 0.000991821_kd, 0.000995159_kd, 0.000994205_kd, 0.000989437_kd, 0.000980854_kd, 0.000968933_kd, & 
 0.000954151_kd, 0.000935555_kd, 0.000915051_kd, 0.000891685_kd, 0.000866413_kd, 0.000838757_kd, 0.000809669_kd, 0.000779152_kd, & 
 0.000747204_kd, 0.000714302_kd, 0.000680923_kd, 0.000646591_kd, 0.000611782_kd, 0.000576973_kd, 0.000542164_kd, 0.000507355_kd, & 
 0.000472546_kd, 0.000438213_kd, 0.000404358_kd, 0.000371456_kd, 0.000339031_kd, 0.000307560_kd, 0.000277042_kd, 0.000247478_kd, & 
 0.000218868_kd, 0.000191212_kd, 0.000165462_kd, 0.000140190_kd, 0.000116348_kd, 0.000093937_kd, 0.000072956_kd, 0.000052929_kd, & 
 0.000034332_kd, 0.000017166_kd, 0.000000954_kd,-0.000013828_kd,-0.000027180_kd,-0.000039577_kd,-0.000050545_kd,-0.000060558_kd, & 
-0.000069618_kd,-0.000077724_kd,-0.000084400_kd,-0.000090122_kd,-0.000095367_kd,-0.000099182_kd,-0.000102520_kd,-0.000105381_kd, & 
-0.000106812_kd,-0.000108242_kd,-0.000108719_kd,-0.000108719_kd,-0.000108242_kd,-0.000107288_kd,-0.000105858_kd,-0.000103951_kd, & 
 0.000101566_kd, 0.000099182_kd, 0.000096321_kd, 0.000093460_kd, 0.000090599_kd, 0.000087261_kd, 0.000083923_kd, 0.000080585_kd, & 
 0.000076771_kd, 0.000073433_kd, 0.000070095_kd, 0.000066280_kd, 0.000062943_kd, 0.000059605_kd, 0.000055790_kd, 0.000052929_kd, & 
 0.000049591_kd, 0.000046253_kd, 0.000043392_kd, 0.000040531_kd, 0.000037670_kd, 0.000034809_kd, 0.000032425_kd, 0.000030041_kd, & 
 0.000027657_kd, 0.000025272_kd, 0.000023365_kd, 0.000021458_kd, 0.000019550_kd, 0.000018120_kd, 0.000016689_kd, 0.000014782_kd, & 
 0.000013828_kd, 0.000012398_kd, 0.000011444_kd, 0.000010014_kd, 0.000009060_kd, 0.000008106_kd, 0.000007629_kd, 0.000006676_kd, & 
 0.000006199_kd, 0.000005245_kd, 0.000004768_kd, 0.000004292_kd, 0.000003815_kd, 0.000003338_kd, 0.000003338_kd, 0.000002861_kd, & 
 0.000002384_kd, 0.000002384_kd, 0.000001907_kd, 0.000001907_kd, 0.000001431_kd, 0.000001431_kd, 0.000000954_kd, 0.000000954_kd, & 
 0.000000954_kd, 0.000000954_kd, 0.000000477_kd, 0.000000477_kd, 0.000000477_kd, 0.000000477_kd, 0.000000477_kd, 0.000000477_kd  ]
      
       type :: t_subband
         real(kd), public, allocatable :: subband(:, :, :) 
       contains
         procedure, public :: init
         procedure, public :: polyphase_filter12
         final             :: fin_subband
       end type t_subband  
     contains
       subroutine init(this, nchannel)
         class(t_subband), intent(in out) :: this
         integer         , intent(in ) :: nchannel
         allocate( this%subband(0:31, 0:11, 0:nchannel - 1) )
       end subroutine init

       subroutine fin_subband(this)
         type(t_subband), intent(in out) :: this
         deallocate( this%subband )
       end subroutine fin_subband

       subroutine polyphase_filter12(this, pcm)
         class(t_subband), intent(in out) :: this
         real(kd), intent(in) :: pcm(0:, 0:) ! shift to start from 0
         integer :: i, ichannel, istart, iend
         do i = 0, 11 
           iend   = 863 - 32 * i
           istart = iend - 512 + 1
           do ichannel = 0, ubound(pcm, 2)
             call polyphase_filter( prototype * pcm(istart:iend, ichannel), this%subband(:, i, ichannel) )
           end do
         end do
       contains 
         subroutine polyphase_filter(z, s)
           real(kd), intent(in ) :: z(0:)
           real(kd), intent(in out) :: s(:)
           real(kd) ::  y(0:63)
           integer :: j
           forall(j = 0:63) y(j) = sum(z(j:511:64)) 
           s = matmul(coeff, y)
         end subroutine polyphase_filter
       end subroutine polyphase_filter12
     end module m_polyphase

Fortran 90 以降では、１行の許される長さは 132 文字までなので、醜くなりますが表を詰めました。プロトタイプ・フィルターの定義より pcm データは配列の添え字の小さい方に時間的に後ろのデータを来るように並べなければなりませんでした。

psycho.f90

心理音響解析は、客観的な音波がどのように主観的な音像ととして認知されるかを解析するものです。
耳の周波数分解能と強度分解能以下であれば元の音波から変化しても、主観的な音像は変わらないことを利用して音声データの圧縮を図るために必要です。

mp1 に関しては、あるフレームに属する各サブバンドごとに許される誤差を求めます。先に述べたように各フレームは 12 個の時間領域に分割されているので、32 個のサブバンド毎にそれぞれの時間領域で許容誤差を求め、12 個中での最小値がそのフレームでの許容誤差となります。

この許容誤差はマスキングしきい値と同一視されます。マスキングしきい値はある音に他の周波数と強度を持つ音を加えた時に認知されるかどうかを表すの閾値の関数です。とくに静音時に対して求めた閾値は絶対閾値と呼ばれます。

心理音響解析は非常に複雑で難解なものですが、ネットなどで拾い集めた情報を総合すると、周波数方向には bark 単位、強度方向には $I^\alpha$$(\alpha\simeq0.3)$ の座標系に変換すると線形重ね合わせが成立するようです。

bark 単位系は人間の周波数分解能の非線形性を、おおむね線形に換算するもので、ここでは周波数方向の座標変換を統一するため、はじめに周波数分解能関数を定義しその逆数の積分で pseud-bark 単位系を定義しました。

また絶対閾値は音に対する認知の感受性を表すもので、一種のばね定数のようなものと考えられるので、マスキング閾値を求める時には、音の強度は絶対閾値で規格化するのが有効なようです。マスキングは bark 単位系で規格化された左右非対称な三角形で近似されます(spreading function)。

正しく理解しているか分かりませんが mp1 に対してはこの程度の解析で最低限成り立つのではないかと思います。

m_fft

２のべき乗の高速フーリエ変換を行なうルーチンです。ハニング窓も定義して用いています。

polyphase filter bank の出力は 32 個のバンド間に重なりがあるように定義されているので、
pcm データを FFT で周波数領域に変換します。polyphase filter bank で変換される pcm データ数は、２のべき乗ではないのですが大きな問題にはなりません。

        module fft_module
          use m_kind
          implicit none
          private
          public :: fft_window
          integer , parameter :: np2 = 9, nn = 2**np2, nn_2 = nn / 2   ! 2^9 = 512
          real(kd), parameter :: pi = 4 * atan(1.0_kd), pi2 = 2 * pi
          real(kd), parameter :: pi2_n = pi2 / nn
          integer, save :: indx(nn) 
          integer, private :: i_
          complex(kd), parameter :: omega(0:*) = [(exp(cmplx(0.0_kd, pi2_n * i_, kind = kd)), i_ = 0, nn - 1)] 
          real(kd)   , parameter :: hann_window(*) = [(0.5_kd * (1.0_kd - cos(pi2_n * i_))  , i_ = 0, nn - 1)]
        contains
          subroutine fft_initialize()
            integer :: i, j2, k, n
            do i = 1, nn
              n = 0
              k = i - 1
              do j2 = np2 - 1, 0, -1
                n = n + mod(k, 2) * 2**j2
                k = k / 2
              end do
              indx(i) = n + 1 ! indx = 1..n
            end do
          end subroutine fft_initialize
 
          subroutine fft_window(x, y)
            real    (kd), intent(in ) :: x(:)
            complex (kd), intent(out) :: y(:)
            logical :: qfirst = .true.
            if (qfirst) then
              qfirst = .false.
              call fft_initialize()
            end if
            y = hann_window * x / nn
            call fft2(y)
          end subroutine fft_window

          subroutine fft2(fft) ! fft_2^np2
            complex (kd), intent(in out) :: fft(:)
            complex (kd) :: tmp1, tmp2, c1
            integer :: i, j, k2, m1, m2, iphase, kn2
            fft = fft(indx)
            do k2 = 1, np2
              kn2 = 2**(k2 - 1)
              do i = 1, nn, 2* kn2
                do j = 1, kn2
                  iphase = 2**(np2 - k2)*(j - 1)
                  c1 = omega( mod(iphase, nn) )
                  m1 = i + j - 1
                  m2 = m1 + kn2
                  tmp1 =      fft(m1)
                  tmp2 = c1 * fft(m2)
                  fft(m1) = tmp1 + tmp2 ! 1 = x^2 ; x = 1 or -1
                  fft(m2) = tmp1 - tmp2
                end do
              end do
            end do
          end subroutine fft2
        end module fft_module

m_psycho

最終的な出力は、音声信号強度とマスクの比の対数 SMR となります。これが許容される相対的誤差を表します。

・以下で用いた定式化
周波数分解能バンド幅(これを周波数方向の基本的な基準とします。)

\mathrm{bw}(f)=25+75(1+1.4(f/1000)^2)^{0.69}

$f$ は Hz 単位。
上で定義したバンド幅とつじつまが合うように定義した pseud-bark

b(f_0)=\int_0^{f_0}{df\over \mathrm{bw}(f)}

ある周波数 $f_0$ での主観的な音強度関数 $I(f_0)$ はバーク座標系で、αノルムを取る。

I(b_0)=\left(\int_{b_0-0.5}^{b_0+0.5}|\mathrm{FFT}(b)|^{2\alpha}db\right)^{1\over\alpha}

周波数で書き直すと、

I(f_0)=\left(\int_{f_0-\mathrm{bw(f_0)/2}}^{f_0+\mathrm{bw(f_0)/2}}|\mathrm{FFT}(f)|^{2\alpha}{df\over\mathrm{bw(f)}}\right)^{1\over\alpha}

ここで $\mathrm{FFT(f)}$ は周波数 $f$ でのフーリエ成分、$\mathrm{bw(f)}$ は周波数 $f$ での周波数分解能バンド幅、$\alpha$ は音の認知に関わる定数をそれぞれ表します。

ある周波数 $f_0$ での音マスキング関数 $\mathrm{Mask}(b_0)$ はバーク座標系で、

\mathrm{Mask}(b_0)=\left\{\int^{\infty}_{0}\left( \mathrm{sp}(b_0-b){|\mathrm{FFT}(b)|^2\over\mathrm{ath}(b)}\right)^{\alpha}{db} \right\}^{1\over\alpha}

周波数で書き直すと、

\mathrm{Mask}(f_0)=\left\{\int^{\infty}_{0}\left( \mathrm{sp}(b(f_0)-b(f)){|\mathrm{FFT}(f)|^2\over\mathrm{ath}(f)}\right)^{\alpha}{df\over\mathrm{bw(f)}} \right\}^{1\over\alpha}

ここで $\mathrm{sp}(b)$ は spreading function、$\mathrm{b}(f)$ はバーク関数、$\mathrm{ath}(f)$ は絶対マスキング閾値関数をそれぞれ表します。

これらを離散化すると、それぞれ以下のようになります。

I_j=\left(\sum_i|\mathrm{FFT}_i|^{2\alpha}{1\over\mathrm{bw}_i}\cdot\mathrm{scale}\right)^{1\over\alpha}

\mathrm{Mask}_j=\left\{\sum_i\left(\mathrm{sp}_{j,i}{|\mathrm{FFT}_i|^2\over\mathrm{ath}_i}\right)^{\alpha}{1\over\mathrm{bw_i}}\cdot\mathrm{scale}\right\}^{1\over\alpha}

ここで $\mathrm{scale}$ は、サンプリング周波数/２＊フーリエ変換の分割数。測度に相当。

（昔のノートを丸写ししているのでおかしかったらゴメンｗ　前に書いた式はおかしかった。連続を離散にするところで無理に規格化したかったようだ。FFTの本数が荒いと式の上では１になる所がならないので、いじくって意味不明に。）

        module m_psycho
          use m_kind
          implicit none
          private
          public :: psychoacoustics
          real(kd), save :: freq_fft(0:256), pseud_bark(0:256), ath_fft(0:256)
          real(kd), save :: crbw_fft(0:256), cbwl_fft(0:256)
          real(kd), save :: sp(256, 256) ! spreading function for masking
          real(kd), save :: scale
          real(kd), parameter :: alpha = 0.27d0 ! non-linear factor
        contains
          subroutine psychoacoustics(pcm, isample_rate, smr)
            use fft_module
            real (kd), intent(in ) :: pcm(:, :) 
            integer       , intent(in ) :: isample_rate
            real (kd), intent(out) :: smr(:, :)
            complex(kd) :: cfft(512)
            integer :: ichannel, i0, i1
            logical, save :: qfirst = .true.
            if (qfirst) then
              qfirst = .false.
              call init_absolute_threshold(isample_rate)
            end if
            i0 = 161 
            i1 = i0 + 512 - 1
            do ichannel = 1, size(pcm, 2)
              call fft_window(pcm(i0:i1, ichannel), cfft)
              call calc_smr(cfft, smr(:, ichannel))
            end do
          end subroutine
 
          subroutine init_absolute_threshold(isample_rate)
            integer, intent(in) :: isample_rate
            real (kd) :: freq, tmp
            integer :: i, m, i0, i1
            do i = 0, size(freq_fft) - 1
              freq = real(isample_rate, kind = kd) / 2.0_kd / 1000.0_kd &
                   * real(i, kind = kd) / real(size(freq_fft), kind = kd)
              ath_fft(i) = 3.64_kd * freq**(-0.8_kd)  & ! 
                         - 6.5_kd * exp(-0.6_kd * (freq - 3.3_kd)**2) + 0.001_kd * freq**4.0_kd   
              freq = freq * 1000.0_kd ! khz -> hz
              freq_fft(i) = freq
            end do
            scale =  real(isample_rate, kind = kd) / 2 / 256 ! freq to fft-line scale
            crbw_fft = critical_band_width( freq_fft )       ! critical band width in hz
            cbwl_fft = decibel( crbw_fft )                   ! critical band width in log
            tmp = 0.0_kd ! pseud bark: integrate critical band  
            do m = 1, 256
              tmp = tmp + 1.0_kd / crbw_fft(m) ! integration
              pseud_bark(m) = tmp * scale 
            end do
            ! spreading function
            forall(i = 1:256, m = 1:256) sp(i, m) = spreading( pseud_bark( i ) - pseud_bark( m ) ) ! top normalized to 1.0
          end subroutine init_absolute_threshold

          pure elemental real(kd) function spreading(z)
            real(kd), intent(in) :: z ! pseud-bark
            if ( z > 0.0d0 ) then
              spreading = -25.0_kd * z 
            else
              spreading =  75.0_kd * z 
            end if
            spreading = max(-160.0_kd, spreading)
          end function spreading

          pure elemental real(kd) function critical_band_width(f)
            real(kd), intent(in) :: f ! hz
            real(kd), parameter :: gamma = 0.69_kd
            critical_band_width = 25.0_kd + 75.0_kd * ( 1.0_kd + 1.4_kd * (f / 1000.0_kd)**2 )**gamma 
          !  critical_band_width = 100.0d0 * ( 1.0d0 + 1.0d0 * (f / 1000.0d0)**2 )**gamma 
          end function critical_band_width

          pure elemental real(kd) function decibel(x) 
            real (kd), intent(in) :: x 
            real (kd) :: tmp
            tmp = max(1.0e-100_kd, x)
            decibel = 10.0e0_kd * log10(tmp)
          end function decibel
      
          subroutine calc_smr(cfft, smr)
            complex(kd), intent(in) :: cfft(0:)
            real   (kd), intent(out) :: smr(:)
            real   (kd), parameter :: pi = 4 * atan(1.0_kd), pi2 = 2 * pi ! fortran2003
            real   (kd) :: snr(32), rmnr(32)
            real   (kd) :: xa(0:256), ya(0:256), za(0:256)
            integer :: iband, i, m, i0, i1
            xa = 2 * decibel( abs(cfft(0:256)) )
            ya = 0.0_kd
            do i = 1, 256 ! convolution of spreading function 
              do m = 1, 256                                                                                    ! i maskee, m masker
                ya(i) = ya(i) + 10.0_kd**( ((sp(i, m) + xa(m) - ath_fft(m)) * alpha - cbwl_fft(m)) / 10.0_kd ) ! non-linear sum
              end do   
            end do   
            ya = max(decibel(ya * scale) / alpha - 11.5_kd, ath_fft - 90.3_kd) ! 11.5 mask factor, 90.3dB = 2^15  ATH shift empirical 
        ! effective spl
            do i = 1, 256
              m = nint( crbw_fft(i) / scale )
              i0 = max(i  - m / 2,   1)       !f0 - bw(f0) / 2 
              i1 = min(i0 + m - 1, 256)       !f0 + bw(f0) / 2
              za(i) = sum( 10.0_kd**( (xa(i0:i1) * alpha  - cbwl_fft(i)) / 10.0_kd ) ) 
            end do
            za = decibel(za * scale) / alpha
            ! smr = snr' - mnr'
            do iband = 1, 32
              m = (iband - 1) * 8 + 1
              i0 = m - nint( crbw_fft(m) / 2 / scale ) ! fl - bw(fl) / 2  ; subband [fl..fh] 
              i0 = max(i0,   1)
              m = m + 7
              i1 = m + nint( crbw_fft(m) / 2 / scale ) ! fh + bw(fh) / 2 
              i1 = min(i1, 256)
              snr(iband)  = maxval( za(i0:i1) )
              rmnr(iband) = minval( ya(i0:i1) ) 
            end do
            smr = snr - rmnr 
           end subroutine calc_smr
        end module m_psycho

この心理音響解析が一番時間のかかる部分です。特に spreading function を畳み込む部分に時間がかかっています。絶対マスキング関数（静音閾値）やバンド幅等は文献値をできるだけそのまま用いることにしました。

layer1.f90

バンドパスフィルターを通した信号情報と心理音響解析で得られた相対許容誤差から、各サブバンドごとにビットを割り振って、許容誤差内に量子化誤差が収まるように量子化してゆきます。相対的量子化誤差は SNR 比として与えられます。

m_layer1

    module m_layer1
      use m_kind
      implicit none
      private
      public :: isubband_normalization, bit_allocation, iquantization
      integer, private :: i_
      ! ISO Table B.1 -- Layer I,II scalefactors
      real(kd), parameter :: scale_factor(0:63) = &
                              [(1.0e-14_kd * anint(1.0e14_kd * 2.0_kd * 2.0_kd**(-real(i_, kd) / 3.0_kd )), i_ = 0, 63)]
      
      ! ISO Table C.2 -- Layer I  Signal-to-Noise Ratios
      !real(kd), parameter :: snr(0:*) =[ 0.00_kd,  7.00_kd, 16.00_kd, 25.28_kd, 31.59_kd, 37.75_kd, 43.84_kd, & 
      !                                  49.89_kd, 55.93_kd, 61.96_kd, 67.98_kd, 74.01_kd, 80.03_kd, 86.05_kd, 92.01_kd]
      ! empirically better SNR = 20 * log(2^n)
      real(kd), parameter :: snr(0:14) = [(20.0_kd * LOG10(2.0_kd) * i_, i_ = 0, 14)]
            
      ! ISO Table C.3 -- Layer I Quantization Coefficients
      real(kd), parameter :: tmp(14) = [( 1.0_kd / 2.0_kd**real(i_ + 1, kd), i_ = 1, 14)]
      real(kd), parameter :: a(14) = [ 1.0e-9_kd * anint(1.0e9_kd * (1.0e0_kd - tmp))]
      real(kd), parameter :: b(14) = [-1.0e-9_kd * anint(1.0e9_kd * tmp)]
    contains
      pure function isubband_normalization(subband) result(iscale_factor)
        real(kd), intent(in) :: subband(0:, 0:, 0:)
        integer :: iscale_factor(0:31, 0:size(subband, 3))
        integer :: ichannel, iband
        do ichannel = 0, size(subband, 3)
          do iband = 0, 31
            iscale_factor(iband, ichannel) = iscale_fac_sub(subband(iband, :, ichannel)) 
          end do
        end do
      contains
        pure integer function iscale_fac_sub(x) result(ires)
          real(kd), intent(in) :: x(0:)
          do ires = 62, 0, -1
            if (maxval(abs(x), dim = 1) < scale_factor(ires)) exit  
          end do 
        end function iscale_fac_sub
      end function isubband_normalization

      subroutine bit_allocation(smr, max_bits, itot_bits, ialloc_bits)
        real(kd), intent(in) :: smr(:, :)
        integer , intent(in) :: max_bits
        integer , intent(   out) :: ialloc_bits(:, :)
        integer , intent(in out) :: itot_bits
        integer  :: max_pos(2), k
        real(kd) :: rmnr(32, 2)
        ialloc_bits = 0
        rmnr = smr - snr(0)
        do 
          select case (max_bits - itot_bits) 
            case (0:11) ! no more bits left 
              exit  
            case (12:29) ! cannot allocate to new place
              max_pos = maxloc(rmnr, mask = ialloc_bits < 13 .AND. ialloc_bits /= 0)
            case (30:)   !
              max_pos = maxloc(rmnr, mask = ialloc_bits < 13)
              if (rmnr( max_pos(1), max_pos(2) ) < -0.44d0) then ! threshold margin 10% -0.46 20% -1.0
                max_pos = maxloc(rmnr, mask = ialloc_bits == 0)
                if (max_pos(1) == 0) max_pos = maxloc(rmnr, mask = ialloc_bits < 13)
              end if
          case default 
            stop 'impossible!: bit_allocation'
          end select
          if (max_pos(1) == 0) exit
          k = ialloc_bits( max_pos(1), max_pos(2) )
          if (k == 0) then  
           itot_bits = itot_bits + 6 + 12 * 2 ! first time: scalefactor 6 bits + 12 * 2 bits   
          else
           itot_bits = itot_bits + 12         ! 12 * 1 bit 
          end if
          ialloc_bits( max_pos(1), max_pos(2) ) = k + 1
          rmnr       ( max_pos(1), max_pos(2) ) = smr( max_pos(1), max_pos(2) ) - snr(k + 1)
        end do
      end subroutine bit_allocation

      pure function iquantization(ialloc_bits, subband, iscale_factor) result(isubband)
        integer , intent(in) :: ialloc_bits(0:, 0:), iscale_factor(0:, 0:)
        real(kd), intent(in) :: subband(0:, 0:, 0:)
        integer :: isubband(0:31, 0:11, 0:ubound(subband, 3))
        integer :: ichannel, iband, k, m
        real(kd) :: s(12)
        do ichannel = 0, ubound(subband, 3)
          do iband = 0, 31  
            k = ialloc_bits(iband, ichannel)
            m = iscale_factor(iband, ichannel)
            if (k /= 0) THEN 
              s = a(k) * subband(iband, :, ichannel)  / scale_factor(m) + b(k) 
              isubband(iband, :, ichannel) = floor( (s + 1.0_kd) * 2**k )
            else
              isubband(iband, :, ichannel) = 0
            end if          
          end do
        end do  
      end function iquantization
    end module m_layer1

ここで bit 分配位置を決めるのに maxloc 関数を用いましたが、この関数は入力配列の何番目かという値を返すので、１から始まる配列でない場合は先頭のオフセット値をずらしてやる必要があります。これが面倒なので、ここでは配列を０から始まるように取りませんでした。

uzura1win.f90

メインプログラムでは、ファイルサイズなどから総フレーム数などを求め、その回数分だけ反復処理を行います。mpeg/audio はバンドパスフィルターで処理・復号することで時間的に遅延が生じるので、初期データ位置の調整などをすべきなのですが、簡単のため省略しました。

名目上のビットレートを合わせるために、時々フレーム長を付け足さねばならないことがります。これは padding と呼ばれる処理です。CD の 44.1KHz のような半端なサンプリングレートの wav ファイルを処理するときに必要となります。やらなくても問題はありません。

main program

      program uzura1
        use m_kind
        use m_mpg
        use m_wav
        use m_mpg_io
        use m_polyphase
        use m_psycho
        use m_layer1
        implicit none
        integer , allocatable :: iscale_factor(:, :), isubband(:, :, :), ialloc_bits(:, :)
        real(kd), allocatable :: pcm(:, :), smr(:, :)
        integer :: max_bits, itot_bits
        integer :: nchannel, iframe = 1, ntotal_frames
        character(len = :), allocatable :: file_name, fn_in, fn_out
        type(t_mpg) :: mpg
        type(t_subband), allocatable :: subb
        type(t_wavfile), allocatable :: wav
        type(t_mpgfile), allocatable :: mp1
        call get_option(mpg, file_name)
        fn_in  = trim(file_name) // '.wav'
        fn_out = trim(file_name) // '.mp1'
        allocate(wav, mp1)
        call wav%open_file(fn_in ) ! read whole wav file
        call mp1%open_file(fn_out)
        call pr_info(mpg)
        call pr_play_time( wav%get_play_time() )
        nchannel = wav%get_channel()  
        allocate( pcm(864, nchannel), smr(32, nchannel) )
        allocate( iscale_factor(32, nchannel), isubband(32, 12, nchannel), ialloc_bits(32, nchannel) )
        allocate(subb)
        call subb%init(nchannel)
        ntotal_frames = wav%get_data_size() / (mpeg_frame_size(mpg%layer) * nchannel * 2)
        pcm = 0.0d0
        do while(iframe <= ntotal_frames )
          call get_maxbits(mpg, max_bits)
          call mp1%clear_bit_buff(max_bits)
          call mp1%encode_header(mpg)
          itot_bits = 32
          call wav%pcm1frame(pcm) 
          call subb%polyphase_filter12(pcm(:, :)) 
          smr = 0
          call psychoacoustics(pcm, wav%get_sampling_rate(), smr)
          iscale_factor = isubband_normalization(subb%subband)
          itot_bits = itot_bits + 4 * 32 * nchannel               ! 4*32*nch bits required for the scale factor bits : 
          if (mpg%icrc == 0) itot_bits = itot_bits + 16 ! 16bits required for the crc
          call bit_allocation(smr, max_bits, itot_bits, ialloc_bits)
          IF (mpg%icrc == 0) call mp1%encode_crc(mpg, ialloc_bits)
          isubband = iquantization(ialloc_bits, subb%subband, iscale_factor)
          CALL mp1%encode_body(ialloc_bits, iscale_factor, isubband)
          CALL mp1%write_bits_1frame(max_bits)
          iframe = iframe + 1
          if (mod(iframe, 200) == 0) call update_status(iframe, ntotal_frames) 
        end do
        write(*, *) 'total frames', iframe - 1, '/', ntotal_frames
        deallocate(subb)
        deallocate(wav) ! close file & deallocte
        deallocate(mp1) ! close file & deallocte
      contains
        subroutine pr_info(mpg)
          type (t_mpg), intent(in) :: mpg
          write(*, *) 'uzura1 (mpeg-1 audio/layer-I encoder) ver.0.4 '
          write(*, *) 'psychoacoustic model ', mpeg_psy_names(mpg%ipsychoacoustic), &
                      ' bit rate (kbps)', mpeg_bit_rates(mpg%ibit_rate, mpg%layer)
          if (mpg%icrc == 0) write(*, *) 'crc16 error protection enabled'
        end subroutine pr_info

        subroutine pr_play_time(itot_sec)
          integer, intent(in) :: itot_sec
          integer :: ihour, imin, isec
          ihour =          itot_sec / 3600
          imin  =      mod(itot_sec, 3600) / 60
          isec  = mod( mod(itot_sec, 3600) , 60 )
          write(*, '(a, i3, a, i2, a, i2)') ' playtime ', ihour, ':', imin, ':', isec
        end subroutine pr_play_time
        
        subroutine update_status(iframe, itot_frames)
          integer, intent(in) :: iframe, itot_frames
          integer :: it(8), ielapsed, iel_min, iel_sec
          integer, save :: istart
          logical, save :: qfirst = .true.
          real    :: percent
          character (len = 10) :: time, date, zone
          call date_and_time(date, time, zone, it)
          if (qfirst) then
           istart   = it(5) * 3600 + it(6) * 60 + it(7)
           qfirst   = .false.
          end if
          ielapsed = it(5) * 3600 + it(6) * 60 + it(7) - istart
          iel_min  =     ielapsed / 60
          iel_sec  = mod(ielapsed , 60)
          percent = real(100 * iframe) / real(itot_frames)
          write(*, '(a, f6.2, a, i4, 2(a, i2), 3(a, i2.2), a, i4.2, a, i2.2, a)')  &
                '+processed...', percent, '%  ', &
                it(1), '/', it(2), '/', it(3), ' ', it(5), ':', it(6), ':', it(7), &
                ' time elapsed ', iel_min, 'min ', iel_sec, 'sec'
        end subroutine update_status

        subroutine get_maxbits(mpg, max_bits)
          type(t_mpg), intent(in out) :: mpg
          integer , intent(out) :: max_bits
          integer        , save :: islot_size
          real (kind = 8), save :: padding, fslot_size
          logical        , save :: qfirst = .true.
          if (qfirst) then
            qfirst = .false.
            padding = 0.0d0
            call calc_slot_size(mpg, islot_size, fslot_size)
          end if
          padding = padding + fslot_size
          if (padding > 1) then
            mpg%ipadding = 1
            padding = padding - 1.0d0
          else
            mpg%ipadding = 0
          end if
          max_bits = ( islot_size + mpg%ipadding ) * 32
        end subroutine get_maxbits  

        subroutine calc_slot_size(mpg, islot_size, fslot_size)
          type(t_mpg), intent(in) :: mpg
          integer , intent(out) :: islot_size
          real(kd), intent(out) :: fslot_size
          real(kd) :: aslot_size
          aslot_size = 12.0d0 * 1000.0d0 * real(mpeg_bit_rates(mpg%ibit_rate, mpg%layer), kind = kd) &
                     / real(mpeg_sample_rates(mpg%isample_rate), kind = kd)
          aslot_size = 1.0d-3 * anint(1.0d3 * aslot_size) 
          islot_size = int(aslot_size)
          fslot_size = aslot_size - islot_size
        end subroutine calc_slot_size
        
        subroutine get_option(mpg, fn_in)
          type(t_mpg), intent(in out) :: mpg
          character (len = :), allocatable, intent(out) :: fn_in
          character (len = 80) :: buffer
          character (len =  6) :: fmt
          integer :: narg, iarg, length
          iarg = 0
          narg = command_argument_count() + 1
          do
            iarg = iarg + 1
            if (iarg >= narg) call print_option()
            call get_command_argument(iarg, buffer)
            if (buffer(1:1) /= '-') exit  
            select case(trim(buffer))
              case ('-b') 
                iarg = iarg + 1
                if ( iarg >= narg ) call print_option()
                call get_command_argument(iarg, buffer, length)
                write(fmt, '(a, i1, a)') '(i', length, ')' 
                read(buffer, fmt) mpg%ibit_rate
                if (mpg%ibit_rate < 1 .or. mpg%ibit_rate > 14) call print_option()
              case ('-crc') 
                mpg%icrc = 0       ! crc16 on 
              case ('-c') 
                mpg%icopyright = 1 ! copyrigt on
              case ('-o') 
                mpg%ioriginal = 1  ! original on
              case default
                call print_option()
            end select
          end do
          fn_in = trim(buffer)
        end subroutine get_option   
 
       subroutine print_option()
         write(*, *) 'Usage : uzura -option file_name '
         write(*, *) '      : file_name.wav -> file_name.mp1'
         write(*, *) 'Option: -b 1..14  bitrate '
         write(*, *) '        -crc      CRC16 error protection on'
         write(*, *) '        -c        copyright flag on'
         write(*, *) '        -o        original  flag on'
         stop
       end subroutine print_option
     end program uzura1

メインプログラムでは、入力ファイル名をコマンドライン引数としてうけとり、同じ名前の mp1 拡張子を持つファイルを出力します。

mp1 エンコーダの最小の実装としては、CRC は省略しても問題ありません。また心理音響解析も SMR に０か何か適当な値を入れておけば省略しても問題ありません。（単にビットの割り振りが非効率になるだけです。）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up