Help us understand the problem. What is going on with this article?

17章 Spaghetti Query

More than 5 years have passed since last update.

Questions from your boss

  • How many products
  • How many developers fixed bugs
  • Average number of bugs fixed per developer
  • How many of our fixed bugs were reported by customers

Your hope
hoping to do the least amount of duplicate work and therefore produce the results faster
So
you make one complex query

report.sql
SELECT COUNT(bp.product_id) AS how_many_products,
  COUNT(dev.account_id) AS how_many_developers,
  COUNT(b.bug_id)/COUNT(dev.account_id) AS avg_bugs_per_developer,
  COUNT(cust.account_id) AS how_many_customers
FROM Bugs b JOIN BugsProducts bp ON (b.bug_id = bp.bug_id) JOIN Accounts dev ON (b.assigned_to = dev.account_id)
JOIN Accounts cust ON (b.reported_by = cust.account_id) WHERE cust.email NOT LIKE '%@example.com'
GROUP BY bp.product_id;

:sob: The result seems wrong

17.1 Objective: Decrease SQL Queries

  • How can I do this with a single query?
    • one SQL query is difficult, complex, and expensive
    • two SQL queries must be twice as bad
  • Some programming framework(MyBatis)
  • I can not solve these data without sql
    • I can't write shell script! perl? what is it?
    • Excel is good but i need a good format
    • what? Database is on remote server???

17.2 Antipattern: Solve a Complex Problem in One Step

17.2.1 Out of your expectation

Let's see an example
count the number of bugs fixed, and the number of bugs open, for a given product

cartesian.sql
SELECT p.product_id,
         COUNT(f.bug_id) AS count_fixed,
         COUNT(o.bug_id) AS count_open
FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id;

:scream: I don't think anyone will write such a query if he really know about SQL

:broken_heart: Tell me why? Give me a reason!!

You happen to know that in reality there are 12 fixed bugs and 7 open bugs for the given product

product_id count_fixed count_open
1 84 84

We got a Cartesian product

b51fad2d29628a1a53e0049f05320761.png

17.2.2 More serious

  • simply hard to write
  • hard to modify
  • hard to debug
  • big runtime costs

17.3 How to Recognize the Antipattern

  • Why are my sums and counts impossibly large?
  • I’ve been working on this monster SQL query all day!
  • take too long to figure out how to recode the SQL query
  • Try putting another DISTINCT into the query

17.4 Legitimate Uses of the Antipattern

  • business intelligence
  • reporting tools

:exclamation: Pay attention to report’s complexity and the hours it takes to produce it

:astonished: But can i tell my boss that you need to organize the report by yourself.

17.5 Solution: Divide and Conquer

The Law of Parsimony

When you have two competing theories that make exactly the same predictions, the simpler one is the better.

17.5.1 One Step at a Time

split up a Spaghetti Query into several simpler queries

split-query.sql
SELECT p.product_id, COUNT(f.bug_id) AS count_fixed
FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') WHERE p.product_id = 1
GROUP BY p.product_id;

SELECT p.product_id, COUNT(o.bug_id) AS count_open
FROM BugsProducts p
LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id;

Merits

  • Avoid unwanted Cartesian product
  • Easier to maintain when add another requirement
  • Easier for SQL engine to do optimization
  • Easier to share with others(Code review, explain .. etc.)

17.5.2 Look for the UNION Label

union.sql
(SELECT p.product_id, f.status, COUNT(f.bug_id) AS bug_count
FROM BugsProducts p
LEFT OUTER JOIN Bugs f ON (p.bug_id = f.bug_id AND f.status = 'FIXED') WHERE p.product_id = 1
GROUP BY p.product_id, f.status)
UNION ALL
(SELECT p.product_id, o.status, COUNT(o.bug_id) AS bug_count
FROM BugsProducts p
LEFT OUTER JOIN Bugs o ON (p.bug_id = o.bug_id AND o.status = 'OPEN') WHERE p.product_id = 1
GROUP BY p.product_id, o.status)
ORDER BY bug_count;

17.5.2 Solving Your Boss’s Problem

  • How many products
count-products.sql
SELECT COUNT(*) AS how_many_products
FROM Products;
  • How many developers fixed bugs
count-developers.sql
SELECT COUNT(DISTINCT assigned_to) AS how_many_developers FROM Bugs
WHERE status = 'FIXED';
  • Average number of bugs fixed per developer
bugs-per-developer.sql
SELECT AVG(bugs_per_developer) AS average_bugs_per_developer
    FROM (SELECT dev.account_id, COUNT(*) AS bugs_per_developer
        FROM Bugs b JOIN Accounts dev
           ON (b.assigned_to = dev.account_id)
        WHERE b.status = 'FIXED' 
        GROUP BY dev.account_id) t;
  • How many of our fixed bugs were reported by customers
bugs-by-customers.sql
SELECT COUNT(*) AS how_many_customer_bugs
FROM Bugs b JOIN Accounts cust ON (b.reported_by = cust.account_id) WHERE b.status = 'FIXED' AND cust.email NOT LIKE '%@example.com';

17.5.6 Writing SQL Automatically—with SQL

Generate SQL by code

generate-update.sql
SELECT CONCAT('UPDATE Inventory '
              ' SET last_used = ''', MAX(u.usage_date), '''',
              ' WHERE inventory_id = ', u.inventory_id, ';
              ') AS update_statement
    FROM ComputerUsage u
    GROUP BY u.inventory_id;

But I have a confuse

How to balance the query times and query time?(May be I am too young too simple)

An Example of myself

query.sql
SELECT t1.reg_date,t1.COUNT, t2.COUNT,t3.COUNT,t4.COUNT,t5.COUNT,t6.COUNT,t7.COUNT,t8.COUNT,t9.COUNT,t10.COUNT, t11.COUNT,t12.COUNT,t13.COUNT,t14.COUNT   
FROM 
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 1 DAY) group by game_reg_date) AS t1,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 2 DAY) group by game_reg_date) AS t2,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 3 DAY) group by game_reg_date) AS t3,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 4 DAY) group by game_reg_date) AS t4,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 5 DAY) group by game_reg_date) AS t5, 
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 6 DAY) group by game_reg_date) AS t6,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 7 DAY) group by game_reg_date) AS t7,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 8 DAY) group by game_reg_date) AS t8,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 9 DAY) group by game_reg_date) AS t9,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 2 DAY) group by game_reg_date) AS t10,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 3 DAY) group by game_reg_date) AS t11,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 4 DAY) group by game_reg_date) AS t12,
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 5 DAY) group by game_reg_date) AS t13, 
(SELECT COUNT(DISTINCT user_id) AS COUNT,game_reg_date AS reg_date FROM login_log where game_id = xx AND login_date BETWEEN DATE_ADD(game_reg_date,INTERVAL 1 DAY) AND DATE_ADD(game_reg_date,INTERVAL 6 DAY) group by game_reg_date) AS t14 
where t1.reg_date = t2.reg_date AND t1.reg_date=t3.reg_date AND t4.reg_date=t1.reg_date AND t1.reg_date=t5.reg_date AND t1.reg_date = t6.reg_date AND t1.reg_date = t7.reg_date AND t1.reg_date = t8.reg_date 
AND t1.reg_date = t9.reg_date AND t1.reg_date=t10.reg_date AND t11.reg_date=t1.reg_date AND t1.reg_date=t12.reg_date AND t1.reg_date = t13.reg_date AND t1.reg_date = t14.reg_date; 
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした