ループからのデータベースアクセスは極力避けよう #Java

タイトルの通りです。

こういうの

// これをやめて
List<Person> people = new ArrayList<>();
for (int id : ids) {
    Person person = repository.findOne(p);
    people.add(person);
}

// こうしよう
List<Person> people = repository.findAll(ids);

当たり前のことではありますが、初学者にありがちなコードな気がしています。
私の狭い観測範囲では、1件取得するSQLがもうあるからこれを使えばいいや、とか、1件ずつ取得した方が処理ロジックが単純だから、というように【動けばいいや】で作られたときにあまり意識してもらえなくて、後々性能問題に繋がることがみられます。

簡単なサンプルコードで速度を測ってみましょう。
サンプルコードは、Spring Boot 2.0 + Spring Data JPAです。

10,000回ループして、ループからデータベースアクセスして10,000件のレコードを取得する処理と、IN句を使って10,000件のレコードを一括で取得する処理を用意して呼び出します。
画面を作るのが面倒だったので、RestControllerにして適当にJSONで出力します。

検証用コード

Controller

package com.example.web;

import com.example.domain.service.BenchmarkService;
import com.example.domain.service.BenchmarkServiceImpl;
import lombok.RequiredArgsConstructor;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.HashMap;
import java.util.Map;

@RestController
@RequestMapping("/benchmark")
@RequiredArgsConstructor
public class SandboxRestController {

    private final BenchmarkService benchmarkService;

    @RequestMapping("/sql/loop")
    public ResponseEntity<String> benchmarkDatabaseAccess() {
        // IN句を使って1回で10,000件のレコードを取得
        long beforeOneTime = System.currentTimeMillis();
        benchmarkService.oneTimesDatabaseAccess();
        long afterOneTime = System.currentTimeMillis();

        // 10,000回のループで1件ずつレコードを取得
        long beforeLoopTime = System.currentTimeMillis();
        benchmarkService.tenThousandTimesDatabaseAccess();
        long afterLoopTime = System.currentTimeMillis();

        Map<String, Long> result = new HashMap<>();
        result.put("oneTimeFirst", afterOneTime - beforeOneTime);
        result.put("loopTimeFirst", afterLoopTime - beforeLoopTime);

        // 同様の処理をもう1回
        beforeOneTime = System.currentTimeMillis();
        benchmarkService.oneTimesDatabaseAccess();
        afterOneTime = System.currentTimeMillis();

        beforeLoopTime = System.currentTimeMillis();
        benchmarkService.tenThousandTimesDatabaseAccess();
        afterLoopTime = System.currentTimeMillis();

        result.put("oneTimeSecond", afterOneTime - beforeOneTime);
        result.put("loopTimeSecond", afterLoopTime - beforeLoopTime);
        return ResponseEntity.status(200).body(result.toString());
    }
}

Service

@Service
@RequiredArgsConstructor
public class BenchmarkServiceImpl implements BenchmarkService {

    private final BenchmarkRepository benchmarkRepository;

    public void tenThousandTimesDatabaseAccess() {
        for (int i = 1; i <= 10000; i++) {
            benchmarkRepository.findById(i);
        }
    }

    public void oneTimesDatabaseAccess() {
        List<Integer> ids = IntStream.range(1, 10000).boxed().collect(toList());
        benchmarkRepository.findByIdIn(ids);
    }
}

データベースはSpring Data JPAでお手軽に用意したテーブルを使います。

Entity

package com.example.domain.model;

import lombok.Data;

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;
import javax.validation.constraints.NotNull;

@Entity
@Data
@Table(name = "benchmark")
public class BenchmarkEntity {

    public BenchmarkEntity() {};

    public BenchmarkEntity(int id, int num) {
        this.id = id;
        this.num = num;
    }

    @Id
    @GeneratedValue
    private Integer id;

    @NotNull
    private int num;
}

Repository

package com.example.domain.repository;

import com.example.domain.model.BenchmarkEntity;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;

import java.util.List;

@Repository
public interface BenchmarkRepository extends JpaRepository<BenchmarkEntity, Integer> {

    // SELECT * FROM benchmark WHERE id = #{id}
    BenchmarkEntity findById(int id);

    // SELECT * FROM benchmark WHERE id IN (#{...ids})
    List<BenchmarkEntity> findByIdIn(List<Integer> ids);
}

結果

h2databaseでオンメモリで動作させたところ、

10,000回ループは初回711ms、2回目438ms
一括取得は初回393ms、2回目24ms

という感じでした。
オンメモリですらこれだけ差があるので、例えば別環境のデータベースにアクセスするなど、1回あたりのリードタイムが増えるにつれて、より差が大きくなっていくでしょう。

実装として1件ずつ取得した方が処理をしやすく、一括取得の場合はそれ以降の処理が少々複雑になったり、その処理でループを回すことになり時間がかかることもあるかと思いますが、大抵の場合は（極端に性能が悪いSQLを発行していなければ）一括で取得した方が早いはずです。

ループから呼び出すSQLを完全に無くそう、とまでは言わないですが、要件上、高負荷時にループ回数が増えるであろうことが見込まれるような処理では気にかけるようにしましょう。

参考：https://github.com/tnemotox/sandbox