Progaku Advent Calendar 2024

Progaku

Drizzle ORMでの初期データ投入戦略案

Last updated at 2024-12-18Posted at 2024-12-18

はじめに

本記事は Progaku Advent Calendar 2024 19日目の記事になります。

趣味でDrizzle ORMを触っていて、Djangoにある fixture のような仕組みが欲しくなりました。
seedデータを作る機能は提供されていますが、私が欲しいのは固定値が投入される機能です。

初回に流す用ファイルは愚直に書くでもよかったのですが、面白そうなのとせっかくアドベントカレンダーのネタになりそうな機能だったので自作してみました。

ただ考慮事項がまったく足りていないので実運用に耐えられなさそうです。あくまで戦略案として参考程度に留めていただきたいです。

環境

docker-compose

http://localhost:8080/ でphpMyAdminに繋がるようにしています。

version: '3'

services:
  db:
    image: mysql
    container_name: mysql_container
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: ${MYSQL_DATABASE}
      MYSQL_USER: ${MYSQL_USER}
      MYSQL_PASSWORD: ${MYSQL_PASSWORD}
    ports:
      - ${MYSQL_PORT}:${MYSQL_PORT}
    volumes:
      - db_data:/var/lib/mysql

  phpmyadmin:
    image: phpmyadmin/phpmyadmin
    container_name: phpmyadmin_container
    restart: always
    environment:
      PMA_HOST: db
      PMA_USER: ${MYSQL_USER}
      PMA_PASSWORD: ${MYSQL_PASSWORD}
      PMA_ABSOLUTE_URI: http://localhost:8080/
    ports:
      - 8080:80
    depends_on:
      - db

volumes:
  db_data:

env

MYSQL_HOST=localhost
MYSQL_DATABASE=my_database
MYSQL_PORT=3306
MYSQL_USER=my_user
MYSQL_PASSWORD=password0000

戦略案

fixtures/ 配下に投入したい初期データを下記のようにJSON形式で記載します。
そして、ファイル名はテーブル名と同じ名前にします。

[
  {
    "id": 1,
    "createdAt": "1970-01-01T00:00:00Z",
    "createdBy": 1,
    "name": "JavaScript"
  },
  {
    "id": 2,
    "createdAt": "1970-01-01T00:00:00Z",
    "createdBy": 1,
    "name": "TypeScript"
  },
  {
    "id": 3,
    "createdAt": "1970-01-01T00:00:00Z",
    "createdBy": 1,
    "name": "CoffeeScript"
  }
]

fixtures配下にある初期データを、初期データ投入スクリプトから呼び出し、テーブル間のリレーションを考慮した順番で、ファイル名にしているテーブルに初期データを投入します。

例

文章に自信がないので具体例を出します。

まずはディレクトリ構造は下記を想定します。

src/
├── fixtures/
│   ├── tag.json
│   └── user.json
├── init.ts
└── schema.ts

そして、テーブルは下記の通りに定義されているものとします。

CREATE TABLE `user` (
  `id` INT AUTO_INCREMENT PRIMARY KEY,
  `created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `name` VARCHAR(256) NOT NULL,
  `code` VARCHAR(256) NOT NULL UNIQUE KEY `uniq_code`
);

CREATE TABLE `tag` (
  `id` INT AUTO_INCREMENT PRIMARY KEY,
  `created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `created_by` INT NOT NULL,
  `name` VARCHAR(256) NOT NULL UNIQUE KEY `uniq_name`,
  FOREIGN KEY (`created_by`) REFERENCES `user`(`id`) ON DELETE CASCADE
);

各jsonファイルの中身は下記のようになっています。。

tag.json

[
  {
    "id": 1,
    "createdAt": "1970-01-01T00:00:00Z",
    "createdBy": 1,
    "name": "JavaScript"
  },
  {
    "id": 2,
    "createdAt": "1970-01-01T00:00:00Z",
    "createdBy": 1,
    "name": "TypeScript"
  },
  {
    "id": 3,
    "createdAt": "1970-01-01T00:00:00Z",
    "createdBy": 1,
    "name": "CoffeeScript"
  }
]

user.json

[
  {
    "id": 1,
    "createdAt": "1970-01-01T09:00:00",
    "name": "管理者",
    "code": "admin"
  }
]

本記事で作成するプログラム init.ts を流すと、下記のように、依存関係順 user -> tag へ fixtures 配下に定義されているjsonファイルのレコードがDBに投入されます。 (tagはuserへの外部キーを持っている)

INSERT INTO user (id, createdAt, name, code)
VALUES
    (1, '1970-01-01T00:00:00Z', '管理者', 'admin');

INSERT INTO tag (id, createdAt, createdBy, name)
VALUES
    (1, '1970-01-01T00:00:00Z', 1, 'JavaScript'),
    (2, '1970-01-01T00:00:00Z', 1, 'TypeScript'),
    (3, '1970-01-01T00:00:00Z', 1, 'CoffeeScript');

これができるようになることを目指します！

未考慮事項

現在私が見つけている未考慮事項を列挙します。

複数schemeへの投入
テーブルの数が増えた時
fixtureのレコード数が膨大な場合
閉路がある場合
トポロジカルソートの妥当性

init

今回はMySQLで行いたいので、

に従い設定していきます。

install

npm i drizzle-orm mysql2 dotenv
npm i -D drizzle-kit tsx

また、オブジェクトをイジイジするためRemedaも入れます。

npm i remeda

ディレクトリ構成

.
├── .env
├── README.md
├── docker-compose.yml
├── drizzle.config.ts
├── node_modules/
├── package-lock.json
├── package.json
├── src
│   └── db
│       ├── drizzle
│       ├── fixtures
│       │   ├── article.json
│       │   ├── tag.json
│       │   ├── user.json
│       │   └── user_tag_relation.json
│       ├── init.ts
│       └── schema.ts
└── tsconfig.json

drizzle.config.ts

import 'dotenv/config';
import { defineConfig } from 'drizzle-kit';

export default defineConfig({
  out: "./src/db/drizzle",
  schema: './src/db/schema.ts',
  dialect: 'mysql',
  dbCredentials: {
    user: process.env.MYSQL_USER || "root",
    password: process.env.MYSQL_PASSWORD || "password",
    host: process.env.MYSQL_HOST || "localhost",
    port: Number(process.env.MYSQL_PORT) || 3306,
    database: process.env.MYSQL_DATABASE || "my_database"
  },
});

schema.ts

テーブルを定義します。考慮外なのでschemaは作らないです。

import { boolean, int, mysqlTable, text, timestamp, uniqueIndex, varchar } from 'drizzle-orm/mysql-core';

export const User = mysqlTable('user', {
  id: int('id').autoincrement().primaryKey(),
  createdAt: timestamp('created_at', { mode: 'string' }).notNull().defaultNow(),
  name: varchar('name', { length: 256 }).notNull(),
  code: varchar('code', { length: 256 }).notNull().unique('uniq_code'),
});

export const Tag = mysqlTable('tag', {
  id: int('id').autoincrement().primaryKey(),
  createdAt: timestamp('created_at', { mode: 'string' }).notNull().defaultNow(),
  createdBy: int('created_by')
    .notNull()
    .references(() => User.id, { onDelete: 'cascade' }),
  name: varchar('name', { length: 256 }).notNull().unique('uniq_name'),
  isOfficial: boolean('is_official').notNull().default(false),
});

export const UserTagRelation = mysqlTable(
  'user_tag_relation',
  {
    id: int('id').autoincrement().primaryKey(),
    createdAt: timestamp('created_at', { mode: 'string' }).notNull().defaultNow(),
    userId: int('user_id')
      .notNull()
      .references(() => User.id, { onDelete: 'cascade' }),
    tagId: int('tag_id')
      .notNull()
      .references(() => Tag.id, { onDelete: 'cascade' }),
  },
  (table) => ({
    uniqUserAndTag: uniqueIndex('user_and_tag_uniq').on(table.userId, table.tagId),
  }),
);

export const Article = mysqlTable(
  'article',
  {
    id: int('id').autoincrement().primaryKey(),
    createdAt: timestamp('created_at', { mode: 'string' }).notNull().defaultNow(),
    updatedAt: timestamp('updated_at', { mode: 'string' })
      .notNull()
      .defaultNow()
      .$onUpdate(() => new Date().toISOString()),
    title: varchar('title', { length: 256 }).notNull(),
    articlePath: text('article_path').notNull(),
    userId: int('user_id')
      .notNull()
      .references(() => User.id, { onDelete: 'cascade' }),
  },
  (table) => ({
    uniqUserAndArticleTitle: uniqueIndex('article_title_and_user_uniq').on(table.userId, table.title),
  }),
);

export const ArticleTagRelation = mysqlTable(
  'article_tag_relation',
  {
    id: int('id').autoincrement().primaryKey(),
    createdAt: timestamp('created_at', { mode: 'string' }).notNull().defaultNow(),
    articleId: int('article_id')
      .notNull()
      .references(() => Article.id, { onDelete: 'cascade' }),
    tagId: int('tag_id')
      .notNull()
      .references(() => Tag.id, { onDelete: 'cascade' }),
  },
  (table) => ({
    uniqArticleAndTag: uniqueIndex('article_and_tag_uniq').on(table.articleId, table.tagId),
  }),
);

ER図

ER図にすると下記の通り。

マイグレーション

これでMySQLにテーブルが作成されます。

# container start
docker-compose up -d
# generate migrate file
npx drizzle-kit generate
# migrate
npx drizzle-kit migrate

docker

終わったら下記のコマンドでお掃除できます。

# container down
docker-compose down
# remove volumes
docker-compose down -v
# remove images
docker images -qa | xargs docker rmi

fixtures

fixtureは下記の通りに定義します。
テーブル名をファイル名として、各カラムに入れたいレコードをオブジェクトで表現します。
ファイル名はDBに合わせているのにレコードのカラムはdrizzleで定義したschemaのプロパティ名になっていて、ブレブレなのが気に食わないのですが、こだわるには面倒だったのでそのままにしてあります。

article.json

[
  {
    "id": 1,
    "createdAt": "1970-01-01T09:00:00",
    "updatedAt": "1970-01-01T09:00:00",
    "title": "はじめまして",
    "articlePath": "path/to/initial",
    "userId": 1
  }
]

tag.json

[
  {
    "id": 1,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "JavaScript",
    "isOfficial": true
  },
  {
    "id": 2,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "TypeScript",
    "isOfficial": true
  },
  {
    "id": 3,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "CoffeeScript",
    "isOfficial": true
  },
  {
    "id": 4,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "React",
    "isOfficial": true
  },
  {
    "id": 5,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "Angular",
    "isOfficial": true
  },
  {
    "id": 6,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "Vue",
    "isOfficial": true
  },
  {
    "id": 7,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "AWS",
    "isOfficial": true
  },
  {
    "id": 8,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "AWS lambda",
    "isOfficial": true
  },
  {
    "id": 9,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "EC2",
    "isOfficial": true
  },
  {
    "id": 10,
    "createdAt": "1970-01-01T09:00:00",
    "createdBy": 1,
    "name": "S3",
    "isOfficial": true
  }
]

user.json

[
  {
    "id": 1,
    "createdAt": "1970-01-01T09:00:00",
    "name": "管理者",
    "code": "admin"
  },
  {
    "id": 2,
    "createdAt": "1970-01-01T09:00:00",
    "name": "フロント",
    "code": "front"
  },
  {
    "id": 3,
    "createdAt": "1970-01-01T09:00:00",
    "name": "AWS",
    "code": "aws"
  }
]

user_tag_relation.json

[
  {
    "id": 1,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 2,
    "tagId": 1
  },
  {
    "id": 2,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 2,
    "tagId": 2
  },
  {
    "id": 3,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 2,
    "tagId": 3
  },
  {
    "id": 4,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 2,
    "tagId": 4
  },
  {
    "id": 5,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 2,
    "tagId": 5
  },
  {
    "id": 6,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 2,
    "tagId": 6
  },
  {
    "id": 7,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 3,
    "tagId": 7
  },
  {
    "id": 8,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 3,
    "tagId": 8
  },
  {
    "id": 9,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 3,
    "tagId": 9
  },
  {
    "id": 10,
    "createdAt": "1970-01-01T09:00:00",
    "userId": 3,
    "tagId": 10
  }
]

タイムスタンプが9時間ズレている件について

JST 1970-01-01 09:00:00 よりも前はUNIX Timestampでは表現できないので、 1970-01-01 09:00:00 としています。
今回はこれで影響がないし、考慮も面倒だったので、 1970-01-01 09:00:00 で登録しています。

初期データ投入

下記の通りに init.ts を作成します。

import 'dotenv/config';
import * as fs from 'node:fs';
import * as path from 'node:path';
import { drizzle } from 'drizzle-orm/mysql2';
import mysql, { type RowDataPacket } from 'mysql2/promise';
import * as R from 'remeda';

import { Table, getTableName } from 'drizzle-orm';
import * as Schema from './schema';

const QUERY = `SELECT
  TABLE_NAME AS table_name,
  REFERENCED_TABLE_NAME AS reference_table_name
FROM
  information_schema.KEY_COLUMN_USAGE`;

const fixturesDir = path.join(__dirname, 'fixtures');

function topologicalSort(dependencyMap: Map<string, Set<string>>): string[] {
  const sorted: string[] = [];
  const visited: Set<string> = new Set();
  const tempMarked: Set<string> = new Set();

  const visit = (node: string): void => {
    if (tempMarked.has(node)) {
      throw new Error(`${node}で閉路を検出`);
    }

    if (!visited.has(node)) {
      tempMarked.add(node);

      const dependencies = dependencyMap.get(node) || new Set();
      for (const dep of dependencies) {
        visit(dep);
      }

      tempMarked.delete(node);
      visited.add(node);
      sorted.push(node);
    }
  };

  for (const node of dependencyMap.keys()) {
    if (!visited.has(node)) {
      visit(node);
    }
  }

  return sorted;
}

async function main(): Promise<void> {
  const poolConnection = mysql.createPool({
    host: process.env.MYSQL_HOST,
    user: process.env.MYSQL_USER,
    password: process.env.MYSQL_PASSWORD,
    database: process.env.MYSQL_DATABASE,
    port: Number(process.env.MYSQL_PORT),
  });

  const db = drizzle({ client: poolConnection });

  const tables = R.entries(Schema).filter(([_, v]) => v instanceof Table);
  const schemeMap = new Map(tables.map(([_, v]) => [getTableName(v) as string, v]));
  const dependencyMap = new Map<string, Set<string>>(tables.map(([_, v]) => [getTableName(v), new Set<string>()]));

  const [params, _fp] = await poolConnection.execute<RowDataPacket[]>(QUERY);
  const relations = params.filter((res) => res.reference_table_name !== null) as {
    table_name: string;
    reference_table_name: string;
  }[];

  for (const r of relations) {
    const s = dependencyMap.get(r.table_name);
    if (s) {
      s.add(r.reference_table_name);
      dependencyMap.set(r.table_name, s);
    } else {
      throw Error(`schemeに定義していないテーブル: ${r.table_name}`);
    }
  }

  const sorted = topologicalSort(dependencyMap);

  const files = await fs.promises.readdir(fixturesDir);
  const fixtureFileSet = new Set(files.filter((f) => path.extname(f) === '.json').map((f) => path.parse(f).name));

  await db.transaction(async (tx) => {
    for (const tableName of sorted) {
      if (fixtureFileSet.has(tableName)) {
        const filePath = path.join(fixturesDir, `${tableName}.json`);
        const fileContent = await fs.promises.readFile(filePath, 'utf-8');
        const fixture = JSON.parse(fileContent);
        const targetScheme = schemeMap.get(tableName);
        if (!targetScheme) {
          throw Error();
        }

        await tx.insert(targetScheme).values(fixture);
      }
    }
  });

  await poolConnection.end();
}

main().then();

npx tsx src/db/init.ts で実行すると、各テーブルにレコードが投入されます。

解説

じゃああとはコードを読んで理解してくださいは不親切なので説明を入れていきます。

テーブル名とモデルのマッピング

getTableName でテーブル名が取得できるので、 {テーブル名: モデル} のMapを作成します。
今回はMySQLでやっているのですが、モデルオブジェクトの MySqlTableWithColumns や PgTableWithColumns が Table を継承していたので、 instanceof でモデルのみに絞っています。
schemaの定義は mysqlSchema や pgSchema と各々で定義されていたので、多分これで弾けます(未検証)

const tables = R.entries(Schema).filter(([_, v]) => v instanceof Table);
const schemeMap = new Map(tables.map(([_, v]) => [getTableName(v) as string, v]));

グルーピング

SQLでの取得なので、各テーブルごとにグルーピングされていません。後続処理で楽にするため、グルーピングします。

for (const r of relations) {
  const s = dependencyMap.get(r.table_name);
  if (s) {
    s.add(r.reference_table_name);
    dependencyMap.set(r.table_name, s);
  } else {
    throw Error(`schemeに定義していないテーブル: ${r.table_name}`);
  }
}

実行すると dependencyMap は下記のようになります。

Map(5) {
  'article' => Set(1) { 'user' },
  'article_tag_relation' => Set(2) { 'article', 'tag' },
  'tag' => Set(1) { 'user' },
  'user' => Set(0) {},
  'user_tag_relation' => Set(2) { 'tag', 'user' }
}

依存関係順にソート

トポロジカルソートで依存関係順に並べ替えます。
トポロジカルソートではありますが、メモ化しながら深さ優先探索をしているだけです。
トポロジカルソート部分を生成AIに詳しく解説してと投げれば具体例を交えて説明してくれるはずなので、ここでの説明は省きます
fixtureを上から順に愚直に投入すると、先にある投入データが後続のテーブルに対して外部キー制約を貼っていた場合、そこでエラーになってしまうためです。

function topologicalSort(dependencyMap: Map<string, Set<string>>): string[] {
  const sorted: string[] = [];
  const visited: Set<string> = new Set();
  const tempMarked: Set<string> = new Set();

  const visit = (node: string): void => {
    if (tempMarked.has(node)) {
      throw new Error(`${node}で閉路を検出`);
    }

    if (!visited.has(node)) {
      tempMarked.add(node);

      const dependencies = dependencyMap.get(node) || new Set();
      for (const dep of dependencies) {
        visit(dep);
      }

      tempMarked.delete(node);
      visited.add(node);
      sorted.push(node);
    }
  };

  for (const node of dependencyMap.keys()) {
    if (!visited.has(node)) {
      visit(node);
    }
  }

  return sorted;
}

const sorted = topologicalSort(dependencyMap);

依存関係順に並べ替えた結果がこちら。

[
  'user',
  'article',
  'tag',
  'article_tag_relation',
  'user_tag_relation'
]

依存関係順にレコードを投入

ソートした順序に、fixtures/配下にファイルが存在するか確認し、存在すればファイルの中身を読み込んでbulk_insertしています。
ファイル名がテーブル名にしているのはこのためです。
また、実装をミスしていた場合やJSONファイルの作成でミスしていたときに備えてトランザクションを貼っています。

const files = await fs.promises.readdir(fixturesDir);
const fixtureFileSet = new Set(files.filter((f) => path.extname(f) === '.json').map((f) => path.parse(f).name));

await db.transaction(async (tx) => {
  for (const tableName of sorted) {
    if (fixtureFileSet.has(tableName)) {
      const filePath = path.join(fixturesDir, `${tableName}.json`);
      const fileContent = await fs.promises.readFile(filePath, 'utf-8');
      const fixture = JSON.parse(fileContent);
      const targetScheme = schemeMap.get(tableName);
      if (!targetScheme) {
        throw Error();
      }

      await tx.insert(targetScheme).values(fixture);
    }
  }
});

close

やっておきます。

await poolConnection.end();

おわりに

最初、Drizzle ORMのモデル定義から依存関係を取ろうとしていたのですが上手く取れず、抽象的に考えてSQLで取るようにしたら上手くいきました。
schemaspyを知っていたのがフックになりました。
schemaspyについては私の過去記事schemaspyでER図を生成をご覧ください。

繰り返しにはなりますが、あくまで本記事は初期データ投入の "戦略案" であり、未考慮が多い内容になっています。
参考程度に留めていただきたいです。自作するだけでなく、他ツールも検討ください。

作成してみてやっぱり面白かったです。
以上、Progaku Advent Calendar 2024 19日目でした。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up