前置き
雑に継続率を知りたかったのでクエリを書いた。
SQLのパフォーマンスとかは特に考えない。
新規継続率
定義としては、X日に来たした人でY日後に来た人の数が継続率。
ここでは既存継続率ではなく、新規継続率を対象とする。
テーブル
installした日とログインした日が格納されているテーブルを想定する
CREATE TABLE user_login_log(
`user_id` int(10) unsigned NOT NULL,
`login_on` date NOT NULL,
`installed_on` date NOT NULL,
PRIMARY KEY (`user_id`, `login_on`)
);
Query
DATEDIFFを使えば、ログインした日とインストールした日の差分が取れる。
countの条件にY日の値を入れてあげれば、その日の継続日数がでる.
SELECT
installed_on,
COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day0,
COUNT(DATEDIFF(login_on, installed_on) = 1 or NULL) AS day1,
COUNT(DATEDIFF(login_on, installed_on) = 2 or NULL) AS day2,
COUNT(DATEDIFF(login_on, installed_on) = 3 or NULL) AS day3,
COUNT(DATEDIFF(login_on, installed_on) = 4 or NULL) AS day4,
COUNT(DATEDIFF(login_on, installed_on) = 5 or NULL) AS day5
FROM user_login_log
GROUP BY installed_on;
継続率が出したければ、初日の日数で割ってあげればよい。
SELECT
installed_on,
COUNT(DATEDIFF(login_on, installed_on) = 1 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day1,
COUNT(DATEDIFF(login_on, installed_on) = 2 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day2,
COUNT(DATEDIFF(login_on, installed_on) = 3 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day3,
COUNT(DATEDIFF(login_on, installed_on) = 4 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day4,
COUNT(DATEDIFF(login_on, installed_on) = 5 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day5
FROM user_login_log
GROUP BY installed_on;
結果
4/1 ~ 4/5のデータ
- 毎日継続する人を各日に一人づつ.
- 初日だけ継続する人を各日に一人づつ.
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (1, '2017-04-01', '2017-04-01');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (1, '2017-04-02', '2017-04-01');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (1, '2017-04-03', '2017-04-01');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (1, '2017-04-04', '2017-04-01');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (1, '2017-04-05', '2017-04-01');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (2, '2017-04-02', '2017-04-02');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (2, '2017-04-03', '2017-04-02');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (2, '2017-04-04', '2017-04-02');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (2, '2017-04-05', '2017-04-02');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (3, '2017-04-03', '2017-04-03');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (3, '2017-04-04', '2017-04-03');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (3, '2017-04-05', '2017-04-03');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (4, '2017-04-04', '2017-04-04');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (4, '2017-04-05', '2017-04-04');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (5, '2017-04-05', '2017-04-05');
--
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (6, '2017-04-01', '2017-04-01');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (7, '2017-04-02', '2017-04-02');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (8, '2017-04-03', '2017-04-03');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (9, '2017-04-04', '2017-04-04');
INSERT INTO user_login_log (user_id, login_on, installed_on) VALUE (10, '2017-04-05', '2017-04-05');
継続日数
mysql> SELECT
-> installed_on,
-> COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day0,
-> COUNT(DATEDIFF(login_on, installed_on) = 1 or NULL) AS day1,
-> COUNT(DATEDIFF(login_on, installed_on) = 2 or NULL) AS day2,
-> COUNT(DATEDIFF(login_on, installed_on) = 3 or NULL) AS day3,
-> COUNT(DATEDIFF(login_on, installed_on) = 4 or NULL) AS day4,
-> COUNT(DATEDIFF(login_on, installed_on) = 5 or NULL) AS day5
-> FROM user_login_log
-> GROUP BY installed_on;
+--------------+------+------+------+------+------+------+
| installed_on | day0 | day1 | day2 | day3 | day4 | day5 |
+--------------+------+------+------+------+------+------+
| 2017-04-01 | 2 | 1 | 1 | 1 | 1 | 0 |
| 2017-04-02 | 2 | 1 | 1 | 1 | 0 | 0 |
| 2017-04-03 | 2 | 1 | 1 | 0 | 0 | 0 |
| 2017-04-04 | 2 | 1 | 0 | 0 | 0 | 0 |
| 2017-04-05 | 2 | 0 | 0 | 0 | 0 | 0 |
+--------------+------+------+------+------+------+------+
5 rows in set (0.00 sec)
継続率
mysql> SELECT
-> installed_on,
-> COUNT(DATEDIFF(login_on, installed_on) = 1 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day1,
-> COUNT(DATEDIFF(login_on, installed_on) = 2 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day2,
-> COUNT(DATEDIFF(login_on, installed_on) = 3 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day3,
-> COUNT(DATEDIFF(login_on, installed_on) = 4 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day4,
-> COUNT(DATEDIFF(login_on, installed_on) = 5 or NULL) / COUNT(DATEDIFF(login_on, installed_on) = 0 or NULL) AS day5
-> FROM user_login_log
-> GROUP BY installed_on;
+--------------+--------+--------+--------+--------+--------+
| installed_on | day1 | day2 | day3 | day4 | day5 |
+--------------+--------+--------+--------+--------+--------+
| 2017-04-01 | 0.5000 | 0.5000 | 0.5000 | 0.5000 | 0.0000 |
| 2017-04-02 | 0.5000 | 0.5000 | 0.5000 | 0.0000 | 0.0000 |
| 2017-04-03 | 0.5000 | 0.5000 | 0.0000 | 0.0000 | 0.0000 |
| 2017-04-04 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| 2017-04-05 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
+--------------+--------+--------+--------+--------+--------+
5 rows in set (0.00 sec)