LoginSignup
0
0

More than 1 year has passed since last update.

Visualize SQL | Merging Time Periods(part 3)

Posted at

Beginning

This technique is based on SQL Puzzle and Answers
 ⇒ # 59 MERGING TIME PERIODS
It is better with 『SQL Puzzle and Answers』 to read this article.

It explains the idea of algorithm and
You can check what each puzzle is intended for ・・・
But there is almost no commentary about this puzzle

『Part 3』 will be the final episode
The idea is the same as 『Part 2』, but the algorithm is different

Read 『Part 3』after 『Part 2』
We will have a deeper understanding

Let's start !!

What we are going to do is to find
contiguous or overlapping time periods from job table

See blue lines as job time
There are 10 jobs(task_id)
 ⇒ task_id 1 : Starts at 1 and ends at 3
 ⇒ task_id 2 : Starts at 2 and ends at 4
 ⇒ task_id 3 : Starts at 4 and ends at 5
 ⇒ task_id 4 : Starts at 6 and ends at 9
 ⇒ task_id 5 : Starts at 9 and ends at 9
    ・ ・ ・ ・ ・
    ・ ・ ・ ・ ・
 ⇒ task_id 10 : Starts at 17 and ends at 17

『contiguous or overlapping time periods』 will be orange lines on the bottom
We are going to get start and end date of these orange lines by SQL

re_1.jpg

I have checked that it works with PostgreSQL

Tablae and Data

SQL
CREATE TABLE Timesheets(
	task_id CHAR(10) NOT NULL PRIMARY KEY,
	start_date DATE NOT NULL,
	end_date DATE NOT NULL,
	CHECK(start_date <= end_date)
);

INSERT INTO Timesheets
VALUES (1, '1997-01-01', '1997-01-03'),
       (2, '1997-01-02', '1997-01-04'),
       (3, '1997-01-04', '1997-01-05'),
       (4, '1997-01-06', '1997-01-09'),
       (5, '1997-01-09', '1997-01-09'),
       (6, '1997-01-09', '1997-01-09'),
       (7, '1997-01-12', '1997-01-15'),
       (8, '1997-01-13', '1997-01-14'),
       (9, '1997-01-14', '1997-01-14'),
       (10, '1997-01-17', '1997-01-17');

▼ Output(left side of image)

The image used above is displayed by side.
The blue line numbers are the same as the "day" of
start_date and end_date

re_2.jpg

Creating Original Data ① - 1

SQL
SELECT T1.*,T2.*
FROM Timesheets AS T1
LEFT JOIN Timesheets AS T2
  ON T1.start_date > T2.start_date

LEFT JOIN Timesheet(T2) table to Timesheet(T1) table
 to create original data
this condition is ・・・
 ⇒ T1.start_date > T2.start_date
 ⇒ take T2.start_date that is before T1.start_date
   from Timesheet(T2), then join

The image below details start date(1997-01-09)
 ⇒ there are two rows of 1997-01-09
 ⇒ there are four rows of 'start date'
    that fit the condition on T2 side
 ⇒ 8 rows of data are joined(2 x 4)

Other start dates are the same
T2.start_date before T1.start_date is joined

creating all combinations here but
combinations on the same day are not necessary

(e.g)
1997-01-01(T1.start_date) 1997-01-01(T2.start_date)
1997-01-02(T1.start_date) 1997-01-02(T2.start_date)

to exclude those combinations
comparison operator has not 『=』
 ⇒ T1.start_date > T2.start_date

We've done something similar before ・・・
Creating combinations using the same tables

Please check the article below

▼ Output
re_4.jpg

Creating Original Data ① - 2

SQL
SELECT T1.*,T2.*
FROM Timesheets AS T1
LEFT JOIN Timesheets AS T2
  ON T1.start_date > T2.start_date AND T1.start_date <= T2.end_date

Add a second condition to create original data
 ⇒ T1.start_date <= T2.end_date

The two conditions are written using AND operator
so if both conditions are not satisfied
data will not be joined from T2 to T1
 ⇒ Since using LEFT JOIN
   the data remains on the T1 side
   but a space is joined on the T2 side

Please check the image below

▼ Output(right side of image)
re_5.jpg

Analyze Original Data

What kind of data we extracted by LEFT JOIN?
We will check some of those

What we are doing here is similar to『part 2』
・・ i mean, we do the same thing

① 1st, decide T1.start_date
② find T2.start_date that come to the left of T1.start_date
  ⇒ T1.start_date > T2.start_date
③ check T1.start_date and T2.end_date then
 confirming 『T1 & T2』are overlaped or not
  ⇒ T1.start_date <= T2.end_date
④ do these for all T1.start_date

we can say
the rows where T2 data is joined are ・・・
there is another 'start date'
 that can cover wider range than T1.start_date

in other words ・・・
'T1.start_date' that is joined T2 data is unnecessary
the row with blank on the T2 side
 contains 'start date' we want to retrive

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[1] focus on T1.start_date(1997-01-04)

re_6.jpg

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2] focus on T1.start_date(1997-01-09)

re_7.jpg

Creating Original Data ② - 1

SQL
SELECT T3.*,T4.*
FROM Timesheets AS T3
LEFT JOIN Timesheets AS T4
  ON T3.end_date < T4.end_date

LEFT JOIN Timesheet(T4) table to Timesheet(T3) table
 to create original data
this condition is ・・・
 ⇒ T3.end_date < T4.end_date
 ⇒ take T4.end_date that is after T3.end_date
   from Timesheet(T4), then join

The image below details end date(1997-01-03)
 ⇒ there is one row of 1997-01-03
 ⇒ there are nine rows of 'end date'
    that fit condition on T4 side
 ⇒ 9 rows of data are joined(1 x 9)

Other end dates are the same
T4.end_date after T3.end_date is joined

the same as before
creating all combinations here but
combinations on the same day are not necessary

(e.g)
1997-01-03(T3.end_date) 1997-01-03(T4.end_date)
1997-01-04(T3.end_date) 1997-01-04(T4.end_date)

to exclude those combinations
comparison operator has not 『=』
 ⇒ T3.end_date < T4.end_date

re_10.jpg

Creating Original Data ② - 2

SQL
SELECT T3.*,T4.*
FROM Timesheets AS T3
LEFT JOIN Timesheets AS T4
  ON T3.end_date < T4.end_date AND T3.end_date >= T4.start_date

Add a second condition to create original data
 ⇒ T3.end_date >= T4.start_date

The two conditions are written using AND operator
so if both conditions are not satisfied
data will not be joined from T4 to T3
 ⇒ Since using LEFT JOIN
   the data remains on the T3 side
   but a space is joined on the T4 side

Please check the image below

▼ Output(right side of image)
re_11.jpg

Analyze Original Data

What kind of data we extracted by LEFT JOIN?
We will check some of those

What we are doing here is similar to『part 2』
the same as before

① 1st, decide T3.end_date
② find T4.end_date that come to the right of T3.end_date
  ⇒ T3.end_date < T4.end_date
③ check T3.end_date and T4.start_date then
 confirming 『T1 & T2』are overlaped or not
  ⇒ T3.end_date >= T4.start_date
④ do these for all T3.end_date

we can say
the rows where T4 data is joined are ・・・
there is another 'end date'
 that can cover wider range than T3.end_date

in other words ・・・
'T3.end_date' that is joined T4 data is unnecessary
the row with blank on the T4 side
 contains 'end date' we want to retrive

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[1] focus on T3.end_date(1997-01-04)

re_12.jpg

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2] focus on T3.end_date(1997-01-14)

re_13.jpg

Creating Original Data ③

SQL
SELECT X.start_date,Y.end_date
FROM
( -- Start Date
  SELECT T1.start_date ,COUNT(T2.start_date)
  FROM Timesheets AS T1
  LEFT JOIN  Timesheets AS T2
    ON T1.start_date > T2.start_date 
       AND T1.start_date <= T2.end_date
  GROUP BY T1.start_date
  HAVING COUNT(T2.start_date) = 0
) AS X(start_date)

INNER JOIN
( -- ENd Date
  SELECT T3.end_date ,COUNT(T4.start_date)
  FROM Timesheets AS T3
  LEFT JOIN Timesheets AS T4
    ON T3.end_date < T4.end_date
      AND T3.end_date >= T4.start_date
  GROUP BY T3.end_date
  HAVING COUNT(T4.start_date) = 0
) AS Y(end_date) ON X.start_date <= Y.end_date

ORDER BY X.start_date,Y.end_date

▼ Output

X.start_date Y.end_date
1997-01-01 1997-01-05
1997-01-01 1997-01-09
1997-01-01 1997-01-15
1997-01-01 1997-01-17
1997-01-06 1997-01-09
1997-01-06 1997-01-15
1997-01-06 1997-01-17
1997-01-12 1997-01-15
1997-01-12 1997-01-17
1997-01-17 1997-01-17

This SQL joins the output of the end date
 to the output of the start date

Using INNER JOIN here but
 LEFT JOIN is also the same

~~~~~~~~~~~~~~~~~~~~~~~~~
check the start date in detail
In the subquery of 'Start Date', doing [1][2][3] below

[1] 『GROUP BY T1.start_date』makes 'start date' unique
[2] Count T2.start_date that is joined to 'start date'
[3] pick rows that number of count is 0
  ⇒ HAVING COUNT(T2.start_date) = 0

the start dates below are the output
 ⇒ T1.start_date
 ⇒ 1997-01-01,1997-01-06,1997-01-12,1997-01-17

▼ Contents of 'Start Date'
re_16.jpg

~~~~~~~~~~~~~~~~~~~~~~~~~
check the end date in detail
In the subquery of 'End Date', doing [1][2][3] below

[1] 『GROUP BY T3.end_date』makes 'end date' unique
[2] Count T4.start_date that is joined to 'end date'
[3] pick rows that number of count is 0
  ⇒ HAVING COUNT(T4.start_date) = 0

the end dates below are the output
 ⇒ T3.end_date
 ⇒ 1997-01-05,1997-01-09,1997-01-15,1997-01-17

▼ Contents of 'End Date'
re_17.jpg

~~~~~~~~~~~~~~~~~~~~~~~~~
extracted 『start & end date』are joined and
its condition is 『X.start_date <= Y.end_date』
 ⇒ 'end dates' after start date are joined
 ⇒ comparison operator has『=』
   Create all combination including the same "start and end date"
   1997-01-17(T1.start_date) 1997-01-17(T3.end_date)
 ⇒ please check right side of image below

▼ Output(left side of image)
re_18.jpg

Answer SQL

SQL
SELECT X.start_date, MIN(Y.end_date) AS end_date
FROM
( -- Start Date
	SELECT T1.start_date --,COUNT(T2.start_date)
	FROM Timesheets AS T1
	LEFT JOIN  Timesheets AS T2
		ON T1.start_date > T2.start_date 
		AND T1.start_date <= T2.end_date
	GROUP BY T1.start_date
	HAVING COUNT(T2.start_date) = 0
) AS X(start_date)

INNER JOIN
( -- End Date
	SELECT T3.end_date --,COUNT(T4.start_date)
	FROM Timesheets AS T3
	LEFT JOIN Timesheets AS T4
		ON T3.end_date >= T4.start_date
		AND T3.end_date < T4.end_date
	GROUP BY T3.end_date
	HAVING COUNT(T4.start_date) = 0
) AS Y(end_date) ON X.start_date <= Y.end_date

GROUP BY X.start_date

ORDER BY X.start_date

▼ Output(answer)

X.start_date end_date
1997-01-01 1997-01-05
1997-01-06 1997-01-09
1997-01-12 1997-01-15
1997-01-17 1997-01-17

there are 'start date' and 'end date' we want to retrive
in the data that joined『start and end date』before

picking dates from the top in chronological order
we can retrive『contiguous or overlapping time』
 ⇒ the same as 『part 2』
   check 『part 2』to see in detail

re_19.jpg

in the article 『part 2』・・
rows with a return valuee is 0 are taken
using HAVING and CASE statements
from the data that is aggregated with "GROUP BY"

in the article 『part 3』・・
rows that are also 0 are taken usin JOIN

personally, 『part 3』is easier to read
but both are interesting solutions

References

Joe Celko's SQL Puzzles and Answers

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0