Beginning
This technique is based on SQL Puzzle and Answers
⇒ # 59 MERGING TIME PERIODS
It is better with 『SQL Puzzle and Answers』 to read this article.
It explains the idea of algorithm and
You can check what each puzzle is intended for ・・・
But there is almost no commentary about this puzzle
『Part 3』 will be the final episode
The idea is the same as 『Part 2』, but the algorithm is different
Read 『Part 3』after 『Part 2』
We will have a deeper understanding
Let's start !!
What we are going to do is to find
contiguous or overlapping time periods
from job table
See blue lines as job time
There are 10 jobs(task_id)
⇒ task_id 1 : Starts at 1 and ends at 3
⇒ task_id 2 : Starts at 2 and ends at 4
⇒ task_id 3 : Starts at 4 and ends at 5
⇒ task_id 4 : Starts at 6 and ends at 9
⇒ task_id 5 : Starts at 9 and ends at 9
・ ・ ・ ・ ・
・ ・ ・ ・ ・
⇒ task_id 10 : Starts at 17 and ends at 17
『contiguous or overlapping time periods』 will be orange lines on the bottom
We are going to get start and end date of these orange lines by SQL
I have checked that it works with PostgreSQL
Tablae and Data
CREATE TABLE Timesheets(
task_id CHAR(10) NOT NULL PRIMARY KEY,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK(start_date <= end_date)
);
INSERT INTO Timesheets
VALUES (1, '1997-01-01', '1997-01-03'),
(2, '1997-01-02', '1997-01-04'),
(3, '1997-01-04', '1997-01-05'),
(4, '1997-01-06', '1997-01-09'),
(5, '1997-01-09', '1997-01-09'),
(6, '1997-01-09', '1997-01-09'),
(7, '1997-01-12', '1997-01-15'),
(8, '1997-01-13', '1997-01-14'),
(9, '1997-01-14', '1997-01-14'),
(10, '1997-01-17', '1997-01-17');
▼ Output(left side of image)
The image used above is displayed by side.
The blue line numbers are the same as the "day" of
start_date and end_date
Creating Original Data ① - 1
SELECT T1.*,T2.*
FROM Timesheets AS T1
LEFT JOIN Timesheets AS T2
ON T1.start_date > T2.start_date
LEFT JOIN Timesheet(T2) table to Timesheet(T1) table
to create original data
this condition is ・・・
⇒ T1.start_date > T2.start_date
⇒ take T2.start_date that is before T1.start_date
from Timesheet(T2), then join
The image below details start date(1997-01-09)
⇒ there are two rows of 1997-01-09
⇒ there are four rows of 'start date'
that fit the condition on T2 side
⇒ 8 rows of data are joined(2 x 4)
Other start dates are the same
T2.start_date before T1.start_date is joined
creating all combinations here but
combinations on the same day are not necessary
(e.g)
1997-01-01(T1.start_date) 1997-01-01(T2.start_date)
1997-01-02(T1.start_date) 1997-01-02(T2.start_date)
to exclude those combinations
comparison operator has not 『=』
⇒ T1.start_date > T2.start_date
We've done something similar before ・・・
Creating combinations using the same tablesPlease check the article below
Creating Original Data ① - 2
SELECT T1.*,T2.*
FROM Timesheets AS T1
LEFT JOIN Timesheets AS T2
ON T1.start_date > T2.start_date AND T1.start_date <= T2.end_date
Add a second condition to create original data
⇒ T1.start_date <= T2.end_date
The two conditions are written using AND operator
so if both conditions are not satisfied
data will not be joined from T2 to T1
⇒ Since using LEFT JOIN
the data remains on the T1 side
but a space is joined on the T2 side
Please check the image below
Analyze Original Data
What kind of data we extracted by LEFT JOIN?
We will check some of those
What we are doing here is similar to『part 2』
・・ i mean, we do the same thing
① 1st, decide T1.start_date
② find T2.start_date that come to the left of T1.start_date
⇒ T1.start_date > T2.start_date
③ check T1.start_date and T2.end_date then
confirming 『T1 & T2』are overlaped or not
⇒ T1.start_date <= T2.end_date
④ do these for all T1.start_date
we can say
the rows where T2 data is joined are ・・・
there is another 'start date'
that can cover wider range than T1.start_date
in other words ・・・
'T1.start_date' that is joined T2 data is unnecessary
the row with blank on the T2 side
contains 'start date' we want to retrive
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[1] focus on T1.start_date(1997-01-04)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2] focus on T1.start_date(1997-01-09)
Creating Original Data ② - 1
SELECT T3.*,T4.*
FROM Timesheets AS T3
LEFT JOIN Timesheets AS T4
ON T3.end_date < T4.end_date
LEFT JOIN Timesheet(T4) table to Timesheet(T3) table
to create original data
this condition is ・・・
⇒ T3.end_date < T4.end_date
⇒ take T4.end_date that is after T3.end_date
from Timesheet(T4), then join
The image below details end date(1997-01-03)
⇒ there is one row of 1997-01-03
⇒ there are nine rows of 'end date'
that fit condition on T4 side
⇒ 9 rows of data are joined(1 x 9)
Other end dates are the same
T4.end_date after T3.end_date is joined
the same as before
creating all combinations here but
combinations on the same day are not necessary
(e.g)
1997-01-03(T3.end_date) 1997-01-03(T4.end_date)
1997-01-04(T3.end_date) 1997-01-04(T4.end_date)
to exclude those combinations
comparison operator has not 『=』
⇒ T3.end_date < T4.end_date
Creating Original Data ② - 2
SELECT T3.*,T4.*
FROM Timesheets AS T3
LEFT JOIN Timesheets AS T4
ON T3.end_date < T4.end_date AND T3.end_date >= T4.start_date
Add a second condition to create original data
⇒ T3.end_date >= T4.start_date
The two conditions are written using AND operator
so if both conditions are not satisfied
data will not be joined from T4 to T3
⇒ Since using LEFT JOIN
the data remains on the T3 side
but a space is joined on the T4 side
Please check the image below
Analyze Original Data
What kind of data we extracted by LEFT JOIN?
We will check some of those
What we are doing here is similar to『part 2』
the same as before
① 1st, decide T3.end_date
② find T4.end_date that come to the right of T3.end_date
⇒ T3.end_date < T4.end_date
③ check T3.end_date and T4.start_date then
confirming 『T1 & T2』are overlaped or not
⇒ T3.end_date >= T4.start_date
④ do these for all T3.end_date
we can say
the rows where T4 data is joined are ・・・
there is another 'end date'
that can cover wider range than T3.end_date
in other words ・・・
'T3.end_date' that is joined T4 data is unnecessary
the row with blank on the T4 side
contains 'end date' we want to retrive
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[1] focus on T3.end_date(1997-01-04)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2] focus on T3.end_date(1997-01-14)
Creating Original Data ③
SELECT X.start_date,Y.end_date
FROM
( -- Start Date
SELECT T1.start_date ,COUNT(T2.start_date)
FROM Timesheets AS T1
LEFT JOIN Timesheets AS T2
ON T1.start_date > T2.start_date
AND T1.start_date <= T2.end_date
GROUP BY T1.start_date
HAVING COUNT(T2.start_date) = 0
) AS X(start_date)
INNER JOIN
( -- ENd Date
SELECT T3.end_date ,COUNT(T4.start_date)
FROM Timesheets AS T3
LEFT JOIN Timesheets AS T4
ON T3.end_date < T4.end_date
AND T3.end_date >= T4.start_date
GROUP BY T3.end_date
HAVING COUNT(T4.start_date) = 0
) AS Y(end_date) ON X.start_date <= Y.end_date
ORDER BY X.start_date,Y.end_date
▼ Output
X.start_date | Y.end_date |
---|---|
1997-01-01 | 1997-01-05 |
1997-01-01 | 1997-01-09 |
1997-01-01 | 1997-01-15 |
1997-01-01 | 1997-01-17 |
1997-01-06 | 1997-01-09 |
1997-01-06 | 1997-01-15 |
1997-01-06 | 1997-01-17 |
1997-01-12 | 1997-01-15 |
1997-01-12 | 1997-01-17 |
1997-01-17 | 1997-01-17 |
This SQL joins the output of the end date
to the output of the start date
Using INNER JOIN here but
LEFT JOIN is also the same
~~~~~~~~~~~~~~~~~~~~~~~~~
check the start date in detail
In the subquery of 'Start Date', doing [1][2][3] below
[1] 『GROUP BY T1.start_date』makes 'start date' unique
[2] Count T2.start_date that is joined to 'start date'
[3] pick rows that number of count is 0
⇒ HAVING COUNT(T2.start_date) = 0
the start dates below are the output
⇒ T1.start_date
⇒ 1997-01-01,1997-01-06,1997-01-12,1997-01-17
~~~~~~~~~~~~~~~~~~~~~~~~~
check the end date in detail
In the subquery of 'End Date', doing [1][2][3] below
[1] 『GROUP BY T3.end_date』makes 'end date' unique
[2] Count T4.start_date that is joined to 'end date'
[3] pick rows that number of count is 0
⇒ HAVING COUNT(T4.start_date) = 0
the end dates below are the output
⇒ T3.end_date
⇒ 1997-01-05,1997-01-09,1997-01-15,1997-01-17
~~~~~~~~~~~~~~~~~~~~~~~~~
extracted 『start & end date』are joined and
its condition is 『X.start_date <= Y.end_date』
⇒ 'end dates' after start date are joined
⇒ comparison operator has『=』
Create all combination including the same "start and end date"
1997-01-17(T1.start_date) 1997-01-17(T3.end_date)
⇒ please check right side of image below
Answer SQL
SELECT X.start_date, MIN(Y.end_date) AS end_date
FROM
( -- Start Date
SELECT T1.start_date --,COUNT(T2.start_date)
FROM Timesheets AS T1
LEFT JOIN Timesheets AS T2
ON T1.start_date > T2.start_date
AND T1.start_date <= T2.end_date
GROUP BY T1.start_date
HAVING COUNT(T2.start_date) = 0
) AS X(start_date)
INNER JOIN
( -- End Date
SELECT T3.end_date --,COUNT(T4.start_date)
FROM Timesheets AS T3
LEFT JOIN Timesheets AS T4
ON T3.end_date >= T4.start_date
AND T3.end_date < T4.end_date
GROUP BY T3.end_date
HAVING COUNT(T4.start_date) = 0
) AS Y(end_date) ON X.start_date <= Y.end_date
GROUP BY X.start_date
ORDER BY X.start_date
▼ Output(answer)
X.start_date | end_date |
---|---|
1997-01-01 | 1997-01-05 |
1997-01-06 | 1997-01-09 |
1997-01-12 | 1997-01-15 |
1997-01-17 | 1997-01-17 |
there are 'start date' and 'end date' we want to retrive
in the data that joined『start and end date』before
picking dates from the top in chronological order
we can retrive『contiguous or overlapping time』
⇒ the same as 『part 2』
check 『part 2』to see in detail
in the article 『part 2』・・
rows with a return valuee is 0 are taken
using HAVING and CASE statements
from the data that is aggregated with "GROUP BY"in the article 『part 3』・・
rows that are also 0 are taken usin JOINpersonally, 『part 3』is easier to read
but both are interesting solutions