動機

Haskellでゲームを作っていたらGCのせい(?)でやたらとupdate処理が遅くなりました。
何が原因か分からず、色んなモナドなどを使った時にパフォーマンスにどれだけ影響が出るのかを調べました。

主に GCのために遅くなるのはどういうコードか だけを調べます。

やっていることは「たくさんの Obj 型のデータを用意し、それぞれをupdate」を何度も繰り返しているだけです。

以下がコード。

{-# LANGUAGE TemplateHaskell #-}
import System.Environment
import Control.Monad.State
import Control.Lens

data Obj = Obj { _pos :: (Integer, Integer) } deriving Show
makeLenses ''Obj

benchmark1 :: [Obj] -> IO [Obj]
benchmark1 g = mapM (\e -> update `execStateT` e) g
  where
    update :: StateT Obj IO ()
    update = pos %= (\(x,y) -> (x+10, y+10))

benchmark2 :: [Obj] -> IO [Obj]
benchmark2 g = mapM update g
  where
    update :: Obj -> IO Obj
    update e = return $ pos %~ (\(x,y) -> (x+10, y+10)) $ e

benchmark3 :: [Obj] -> [Obj]
benchmark3 = map update
  where
    update :: Obj -> Obj
    update e = pos %~ (\(x,y) -> (x+10, y+10)) $ e

benchmark4 :: [Obj] -> IO [Obj]
benchmark4 = execStateT $ do
  g <- get
  let g' = map update g
  put g'
  where
    update :: Obj -> Obj
    update e = pos %~ (\(x,y) -> (x+10, y+10)) $ e

benchmark5 :: [Obj] -> IO [Obj]
benchmark5 = execStateT $ do
  g <- get
  g' <- mapM (\e -> lift $ update `execStateT` e) g
  put g'
  where
    update :: StateT Obj IO ()
    update = pos %= (\(x,y) -> (x+10, y+10))

benchmark6 :: [Obj] -> IO [Obj]
benchmark6 = execStateT $ do
  g <- get
  let g' = map (\e -> update `execState` e) g
  put g'
  where
    update :: State Obj ()
    update = pos %= (\(x,y) -> (x+10, y+10))

mainloop :: Integer -> IO [Obj] -> ([Obj] -> IO [Obj]) -> IO Integer
mainloop 0 g _ = (sum . map (\e -> snd $ e^.pos)) `fmap` g
mainloop n g update = mainloop (n-1) (update =<< g) update

mainloop' :: Integer -> [Obj] -> ([Obj] -> [Obj]) -> Integer
mainloop' 0 g _ = sum . map (\e -> snd $ e^.pos) $ g
mainloop' n g update = mainloop' (n-1) (update g) update

main = do
  u <- fmap (read . head) getArgs
  let loopN = 5000
  let objN = 10000
  let g = map (\i -> Obj (i,i^2)) [1..objN]

  print =<< case (u::Integer) of
    1 -> mainloop loopN (return g) benchmark1 
    2 -> mainloop loopN (return g) benchmark2 
    3 -> return $ mainloop' loopN g benchmark3
    4 -> mainloop loopN (return g) benchmark4
    5 -> mainloop loopN (return g) benchmark5
    6 -> mainloop loopN (return g) benchmark6

比較としては以下のように書き方を変えました。

函数	update処理	全体の処理
benchmark1	StateT * IO	IO
benchmark2	IO	IO
benchmark3	純粋	純粋
benchmark4	純粋	StateT * IO
benchmark5	StateT * IO	StateT * IO
benchmark6	State *	StateT * IO

実行結果

函数	real	user	sys
benchmark1	real 0m33.673s	user 0m32.232s	sys 0m1.356s
benchmark2	real 0m33.959s	user 0m32.660s	sys 0m1.232s
benchmark3	real 0m9.574s	user 0m9.524s	sys 0m0.032s
benchmark4	real 0m9.948s	user 0m9.888s	sys 0m0.044s
benchmark5	real 0m37.714s	user 0m36.432s	sys 0m1.208s
benchmark6	real 0m9.319s	user 0m9.272s	sys 0m0.024s

実行は一度しかしていないので1秒程度は誤差として考えても、圧倒的に1,2,5が遅くそれ以外は速いです。
また、1,2,5は実行するとメモリを2G近く消費していました。

考察

わかったことは以下です。

やっぱり純粋な処理だけだと速い
けどStateT別に遅くない
そもそもどのモナドを使うとか関係ないのでは？？？
(上の比較表ほとんど意味なかった)

やたらと遅い1,2,5で共通しているのは mapM 函数を用いている ことで、この処理が大量のコピー&ゴミ生産をしているのではないかと推測しています。
が、これ以上はよく分からないので誰か教えてくださるとありがたいです。

いずれにせよ、実行時間を気にする場合は mapM は避けるほうが無難かもしれないです(?)。
あとStateT全然遅くないです、疑って申し訳ない限り。

GCがパフォーマンス低下を引き起こす？

動機

実行結果

考察