Edited at

17章 Spaghetti Query

More than 3 years have passed since last update.


Questions from your boss


  • How many products

  • How many developers fixed bugs

  • Average number of bugs fixed per developer

  • How many of our fixed bugs were reported by customers

Your hope

hoping to do the least amount of duplicate work and therefore produce the results faster

So

you make one complex query


report.sql

SELECT COUNT(bp.product_id) AS how_many_products,

COUNT(dev.account_id) AS how_many_developers,
COUNT(b.bug_id)/COUNT(dev.account_id) AS avg_bugs_per_developer,
COUNT(cust.account_id) AS how_many_customers
FROM Bugs b JOIN BugsProducts bp ON (b.bug_id = bp.bug_id) JOIN Accounts dev ON (b.assigned_to = dev.account_id)
JOIN Accounts cust ON (b.reported_by = cust.account_id) WHERE cust.email NOT LIKE '%@example.com'
GROUP BY bp.product_id;

:sob: The result seems wrong


17.1 Objective: Decrease SQL Queries


  • How can I do this with a single query?


    • one SQL query is difficult, complex, and expensive

    • two SQL queries must be twice as bad



  • Some programming framework(MyBatis)

  • I can not solve these data without sql


    • I can't write shell script! perl? what is it?

    • Excel is good but i need a good format

    • what? Database is on remote server???




17.2 Antipattern: Solve a Complex Problem in One Step


17.2.1 Out of your expectation

Let's see an example

count the number of bugs fixed, and the number of bugs open, for a given product


cartesian.sql

SELECT p.product_id,

COUNT(f.bug_id) AS count_fixed,
COUNT(o.bug_id) AS count_open
FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id;

:scream: I don't think anyone will write such a query if he really know about SQL

:broken_heart: Tell me why? Give me a reason!!

You happen to know that in reality there are 12 fixed bugs and 7 open bugs for the given product

product_id
count_fixed
count_open

1
84
84

We got a Cartesian product


17.2.2 More serious


  • simply hard to write

  • hard to modify

  • hard to debug

  • big runtime costs


17.3 How to Recognize the Antipattern


  • Why are my sums and counts impossibly large?

  • I’ve been working on this monster SQL query all day!

  • take too long to figure out how to recode the SQL query

  • Try putting another DISTINCT into the query


17.4 Legitimate Uses of the Antipattern


  • business intelligence

  • reporting tools

:exclamation: Pay attention to report’s complexity and the hours it takes to produce it

:astonished: But can i tell my boss that you need to organize the report by yourself.


17.5 Solution: Divide and Conquer


The Law of Parsimony

When you have two competing theories that make exactly the same predictions, the simpler one is the better.



17.5.1 One Step at a Time

split up a Spaghetti Query into several simpler queries


split-query.sql

SELECT p.product_id, COUNT(f.bug_id) AS count_fixed

FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') WHERE p.product_id = 1
GROUP BY p.product_id;

SELECT p.product_id, COUNT(o.bug_id) AS count_open
FROM BugsProducts p
LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id;


Merits


  • Avoid unwanted Cartesian product

  • Easier to maintain when add another requirement

  • Easier for SQL engine to do optimization

  • Easier to share with others(Code review, explain .. etc.)


17.5.2 Look for the UNION Label


union.sql

(SELECT p.product_id, f.status, COUNT(f.bug_id) AS bug_count

FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') WHERE p.product_id = 1
GROUP BY p.product_id, f.status)
UNION ALL
(SELECT p.product_id, o.status, COUNT(o.bug_id) AS bug_count
FROM BugsProducts p
LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id, o.status)
ORDER BY bug_count;


17.5.2 Solving Your Boss’s Problem


  • How many products


count-products.sql

SELECT COUNT(*) AS how_many_products

FROM Products;


  • How many developers fixed bugs


count-developers.sql

SELECT COUNT(DISTINCT assigned_to) AS how_many_developers FROM Bugs

WHERE status = 'FIXED';


  • Average number of bugs fixed per developer


bugs-per-developer.sql

SELECT AVG(bugs_per_developer) AS average_bugs_per_developer

FROM (SELECT dev.account_id, COUNT(*) AS bugs_per_developer
FROM Bugs b JOIN Accounts dev
ON (b.assigned_to = dev.account_id)
WHERE b.status = 'FIXED'
GROUP BY dev.account_id) t;


  • How many of our fixed bugs were reported by customers


bugs-by-customers.sql

SELECT COUNT(*) AS how_many_customer_bugs

FROM Bugs b JOIN Accounts cust ON (b.reported_by = cust.account_id) WHERE b.status = 'FIXED' AND cust.email NOT LIKE '%@example.com';


17.5.6 Writing SQL Automatically—with SQL

Generate SQL by code


generate-update.sql

SELECT CONCAT('UPDATE Inventory '

' SET last_used = ''', MAX(u.usage_date), '''',
' WHERE inventory_id = ', u.inventory_id, ';
'
) AS update_statement
FROM ComputerUsage u
GROUP BY u.inventory_id;


But I have a confuse

How to balance the query times and query time?(May be I am too young too simple)


An Example of myself


query.sql

SELECT t1.reg_date,t1.COUNT, t2.COUNT,t3.COUNT,t4.COUNT,t5.COUNT,t6.COUNT,t7.COUNT,t8.COUNT,t9.COUNT,t10.COUNT, t11.COUNT,t12.COUNT,t13.COUNT,t14.COUNT   

FROM
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 1 DAY) group by game_reg_date) AS t1,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 2 DAY) group by game_reg_date) AS t2,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 3 DAY) group by game_reg_date) AS t3,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 4 DAY) group by game_reg_date) AS t4,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 5 DAY) group by game_reg_date) AS t5,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 6 DAY) group by game_reg_date) AS t6,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 7 DAY) group by game_reg_date) AS t7,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 8 DAY) group by game_reg_date) AS t8,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 9 DAY) group by game_reg_date) AS t9,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 2 DAY) group by game_reg_date) AS t10,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 3 DAY) group by game_reg_date) AS t11,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 4 DAY) group by game_reg_date) AS t12,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 5 DAY) group by game_reg_date) AS t13,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 6 DAY) group by game_reg_date) AS t14
where t1.reg_date = t2.reg_date AND t1.reg_date=t3.reg_date AND t4.reg_date=t1.reg_date AND t1.reg_date=t5.reg_date AND t1.reg_date = t6.reg_date AND t1.reg_date = t7.reg_date AND t1.reg_date = t8.reg_date
AND t1.reg_date = t9.reg_date AND t1.reg_date=t10.reg_date AND t11.reg_date=t1.reg_date AND t1.reg_date=t12.reg_date AND t1.reg_date = t13.reg_date AND t1.reg_date = t14.reg_date;