Beginning
This technique is based on SQL Puzzle and Answers # 59 MERGING TIME PERIODS
It is better with 『SQL Puzzle and Answers』 to read this article.
It explains the idea of algorithm and
You can check what each puzzle is intended for ・・・
But there is almost no commentary about this puzzle
This puzzle handles time, but it's not easy.
I mean you might feel dizzy to read SQL even it's not so long
What we are going to do is to find
contiguous or overlapping time periods
from job table
Let's start !!
See blue lines as job time
There are 10 jobs(task_id)
⇒ task_id 1 : Starts at 1 and ends at 3
⇒ task_id 2 : Starts at 2 and ends at 4
⇒ task_id 3 : Starts at 4 and ends at 5
⇒ task_id 4 : Starts at 6 and ends at 9
⇒ task_id 5 : Starts at 9 and ends at 9
・ ・ ・ ・ ・
・ ・ ・ ・ ・
⇒ task_id 10 : Starts at 17 and ends at 17
『contiguous or overlapping time periods』 will be orange lines on the bottom
We are going to get start and end date of these orange lines by SQL
I have checked that it works with PostgreSQL
Table and data
CREATE TABLE Timesheets(
task_id CHAR(10) NOT NULL PRIMARY KEY,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CHECK(start_date <= end_date)
);
INSERT INTO Timesheets
VALUES (1, '1997-01-01', '1997-01-03'),
(2, '1997-01-02', '1997-01-04'),
(3, '1997-01-04', '1997-01-05'),
(4, '1997-01-06', '1997-01-09'),
(5, '1997-01-09', '1997-01-09'),
(6, '1997-01-09', '1997-01-09'),
(7, '1997-01-12', '1997-01-15'),
(8, '1997-01-13', '1997-01-14'),
(9, '1997-01-14', '1997-01-14'),
(10, '1997-01-17', '1997-01-17');
▼ Output(left side of image)
The image used above is displayed by side.
The blue line numbers are the same as the "day" of
start_date and end_date.
Creating Original Data ①
SELECT T1.*, T2.*
FROM Timesheets AS T1, Timesheets AS T2
WHERE T1.start_date <= T2.end_date
Creating original data to take out 『contiguous or overlapping time periods』
from CROSS JOIN Timesheets table.
⇒ Taking out 『start date』from T1 and 『end date』from T2
⇒ Add a condition of T1.start_date <= T2.end_date
because 『end date』should be later than 『start date』
⇒ Comparison operator is『<=』to cover jos which start_date and
end_date are the same like "task_id 6"
⇒ Combinations that fit this condition are left
▼ Output(Original Data ①)
This is the original data to take out『start date』and『end date』of
『contiguous or overlapping time periods』
⇒ the orange lines on the bottom
Creating Original Data ②
SELECT T3.*, T4.*
FROM Timesheets AS T3, Timesheets AS T4
WHERE T3.end_date < T4.start_date
Creating original data using CROSS JOIN Timesheets table
but a condition is not the same as 『Original Data ①』
⇒ T3.end_date < T4.start_date
⇒ 『start date』of T4 is later than『end date』 of T3
Combinations of T3 and T4 do not overlap
▼ image of 『Original Data ②』
▼ Output(Original Data ②)
Original Data ① : a set of times that have『start date』and『end date』
Original Data ② : a set of non-overlapping times
Our goal is to find 『start date』and『end date』 of
『contiguous or overlapping time periods』
So what we are going to do is ・・・
[1] Exclude 『Original Data ②』from『Original Data ①』
[2] Find start and end dates from the rest of the data
▼ Output(Original Data ① & Original Data ②)
Edit Original Data ①
SELECT T1.*, T2.*
FROM Timesheets AS T1, Timesheets AS T2
WHERE T1.start_date <= T2.end_date
AND NOT EXISTS
(
SELECT *
FROM Timesheets AS T3, Timesheets AS T4
WHERE
T3.end_date < T4.start_date AND
T1.start_date <= T3.start_date AND T3.end_date <= T2.end_date
AND
T1.start_date <= T4.start_date AND T4.end_date <= T2.end_date
)
We want to exclude『Original Data ②』from『Original Data ①』.
However『T3 & T4』and『T1 & T2』are different Timesheets table data.
So they cannot be excluded as they are.
Add『NOT EXISTS』to the WHERE clause of『Original Data ①』.
『Original data ②』is in the『NOT EXISTS』but there are other conditions
⇒ T1.start_date <= T3.start_date AND T3.end_date <= T2.end_date
AND
T1.start_date <= T4.start_date AND T4.end_date <= T2.end_date
⇒ This defines『T3 & T4』are a set of times
that are inside T1.start_date and T2.end_date
⇒ If 『T3 & T4』 which are inside T1.start_date and T2.end_date
fit condition of 『T3.end_date < T4.start_date』
they are excluded(NOT EXISTS)
⇒ 『contiguous or overlapping time periods』are left in『T1 & T2』
▼ (image of how this SQL works)
▼ Output
『T1 & T2』after removing the non-overlapping time
⇒ Edit this output then take out『start_date』and『end_date』
Edit Original Data ②
SELECT T1.start_date, MIN(T2.start_date), MAX(T2.end_date)
FROM Timesheets AS T1, Timesheets AS T2
WHERE T1.start_date <= T2.end_date
AND NOT EXISTS
(
SELECT *
FROM Timesheets AS T3, Timesheets AS T4
WHERE
T3.end_date < T4.start_date AND
T3.start_date >= T1.start_date AND T3.end_date <= T2.end_date
AND
T4.start_date >= T1.start_date AND T4.end_date <= T2.end_date
)
GROUP BY T1.start_date
T1.start_date
become unique because of『GROUP BY T1.start_date』
⇒ Some of these unique would be『start date』
MAX(T2.end_date)
is the latest T2.end_date among the combinations
that T1.star_date has.
⇒ Some of these unique would be『end date』
MIN(T2.start_date)
is the earliest T2.start_date among the combinations
that T1.star_date has.
⇒ Find 『start date』 using these
▼ Output(right side of image)
Edit Original Data ③
SELECT T1.start_date, MIN(T2.start_date), MAX(T2.end_date)
FROM Timesheets AS T1, Timesheets AS T2
WHERE T1.start_date <= T2.end_date
AND NOT EXISTS
(
SELECT *
FROM Timesheets AS T3, Timesheets AS T4
WHERE
T3.end_date < T4.start_date AND
T3.start_date >= T1.start_date AND T3.end_date <= T2.end_date
AND
T4.start_date >= T1.start_date AND T4.end_date <= T2.end_date
)
GROUP BY T1.start_date
HAVING T1.start_date = MIN(T2.start_date) -- Add
▼ Output(bottom of image)
Add T1.start_date = MIN(T2.start_date)
⇒ 『start_date』that we want to pick fits this condition
⇒ pick『start_date』come with 『end_date』
⇒ Why・・・? We will check in detail
▼ ie:Why T1.start_date(1997-01-02) can not be『start date』?
▼ ie:Why T1.start_date(1997-01-14) can not be『start date』?
▼ ie:Why T1.start_date(1997-01-12) can be『start date』?
▼ ie:Why T1.start_date(1997-01-06) can be『start date』?
This SQL has a missing condition
We've checked this SQL so far ・・・
『end date』cannot be taken correctly with this SQL
Fix the problem this SQL has and check why next
start date(T1.start_date) | end date(T2.end_date) | correct end date |
---|---|---|
1997-01-01 | 1997-01-04 | 1997-01-05 |
1997-01-06 | 1997-01-09 | ← OK |
1997-01-12 | 1997-01-15 | ← OK |
1997-01-17 | 1997-01-17 | ← OK |
Modify 『Edit Original Data①』
SELECT T1.*, T2.*
FROM Timesheets AS T1, Timesheets AS T2
WHERE T1.start_date <= T2.end_date
AND NOT EXISTS
(
SELECT *
FROM Timesheets AS T3, Timesheets AS T4
WHERE
T3.end_date < T4.start_date AND
T1.start_date <= T3.start_date AND T3.end_date <= T2.end_date
AND
T1.start_date <= T4.start_date AND T4.end_date <= T2.end_date
AND --- Add from here ↓ ↓ ↓ ↓ ↓ ↓
NOT EXISTS(
SELECT * FROM Timesheets AS T5
WHERE
T5.start_date BETWEEN T3.start_date AND T3.end_date
AND
T5.end_date BETWEEN T4.start_date AND T4.end_date
)
)
▼ Output(right side of image)
Add another 『NOT EXISTS』 inside of 『NOT EXISTS』
then add conditions to 『T3 & T4』 using a copy of Timesheets(T5)
⇒ This will output one more row
This condition is ・・・
Although『T3 & T4』are used in the outer "NOT EXISTS"
to exclude non-overlapping combinations,
leaving required combinations using the inner "NOT EXISTS"
leaving required combinations・・・means
If 『start date』and『end date』 in Timesheets(T5) table are
in the non-overlapping combinations defined using 『T3 & T4』,
they shuould not be in the non-overlapping combinations.
That is why they are left
▼ image of this process
Let's see in detail
Get the removed time combination
with 『NOT EXISTS』
⇒ Change 『NOT EXISTS』to 『EXISTS』then show removed combinations
⇒ It's the green dotted frame
Whether『start date』and『end date』in Timesheets(T5) table are
in the removed time combination or not?
Pick some 『start_date』and 『end_date』in Timesheets(T5)
then check the condition
⇒ 『task_id is 2』fits the condition
start_date(1997-01-02) and end_date(1997-01-04)
⇒ the red frame row is left
⇒ the rest are excluded
check below images
outer and inner NOT EXISTS are pair
Since we want to find『contiguous or overlapping time periods』
it makes sence to exclude time combinations where 『T3.end_date < T4.start_date』
but it is lack of consideration
In this case ・・・
when end date of 『job①』and star date of 『job②』do not overlap
we can see they are not 『contiguous or overlapping』
If 『job③』is inside of『job①』and 『job②』
we can see『job①』『job③』and『job②』are『contiguous or overlapping』
These process is done with the outer NOT EXISTS and
the inner NOT EXISTS
▼ below is the image of how this works
The output of this SQL is the original data
for retrieving『start date』and『end date』
We are going to do the same process using this output
Edit Original Data ④
SELECT T1.start_date, MIN(T2.start_date), MAX(T2.end_date)
FROM Timesheets AS T1, Timesheets AS T2
WHERE T1.start_date <= T2.end_date
AND NOT EXISTS
(
SELECT *
FROM Timesheets AS T3, Timesheets AS T4
WHERE
T3.end_date < T4.start_date AND
T3.start_date >= T1.start_date AND T3.end_date <= T2.end_date
AND
T4.start_date >= T1.start_date AND T4.end_date <= T2.end_date
AND NOT EXISTS
(
SELECT * FROM Timesheets AS T5
WHERE T5.start_date BETWEEN T3.start_date AND T3.end_date
AND T5.end_date BETWEEN T4.start_date AND T4.end_date
)
)
GROUP BY T1.start_date
--HAVING T1.start_date = MIN(T2.start_date)
▼ Output(right side of image)
I explained before ・・・please check there
Answer SQL
SELECT T1.start_date, MIN(T2.start_date), MAX(T2.end_date)
FROM Timesheets AS T1, Timesheets AS T2
WHERE T1.start_date <= T2.end_date
AND NOT EXISTS
(
SELECT *
FROM Timesheets AS T3, Timesheets AS T4
WHERE
T3.end_date < T4.start_date AND
T3.start_date >= T1.start_date AND T3.end_date <= T2.end_date
AND
T4.start_date >= T1.start_date AND T4.end_date <= T2.end_date
AND NOT EXISTS
(
SELECT * FROM Timesheets AS T5
WHERE T5.start_date BETWEEN T3.start_date AND T3.end_date
AND T5.end_date BETWEEN T4.start_date AND T4.end_date
)
)
GROUP BY T1.start_date
HAVING T1.start_date = MIN(T2.start_date)
▼ Output
I explained before ・・・please check there
Finally, we can get 『start date』and『end date』 of
『contiguous or overlapping time periods』
start date(T1.start_date) | end date MAX(T2.end_date) |
---|---|
1997-01-01 | 1997-01-05 |
1997-01-06 | 1997-01-09 |
1997-01-12 | 1997-01-15 |
1997-01-17 | 1997-01-17 |
It's been a little long ・・・
I explained this puzzle with incomplete SQL on purpose at the beginning
because I thought it would help to understand how to handle data with SQLI used many pictures and visualized date flow to explain
but still some explanations may be difficult to catchBecause my words are poor ・・・ or
takes some time to get used to understand this kind of SQLThis is a good SQL puzzle for treating data as a set