The Montoya Herald, a weblog about Blueprint, jQuery, design, music and life, publishing on the web since September 2005. Written by Christian Montoya: developer, designer and entrepreneur.

The Montoya Herald — ChristianMontoya.com

Search

Supported By

Nested queries, SQL_CALC_FOUND_ROWS (MySQL)

Posted on October 17.

If nothing in the title of this post makes sense to you, you might want to avoid this post entirely. This is one of those technical ones.

I wrote my Haiku application for Facebook from scratch. I don't know why I still insist on doing things from scratch, but I think it has something to do with the fact that I am used to having complete control over what I write. One of the issues with this app was that I needed quite a bit of data for each page and I wanted to do it in one query. Here's the schema I was working with (bold fields are the primary keys):

haikus: haikuID, authorID (user), title, line1-3, timestamp
favorites: userID, haikuID, timestamp
sends: fromID, toID, haikuID, timestamp

There's no user information table because all of that is retrieved from Facebook through the API (this is one of the nice things about Facebook apps; you can develop a user application without having to deal with all that user stuff).

In my app users can write haikus, so each haiku is unique. They can add and remove any haiku as a favorite (including their own), so a user might have 0 favorites or 100 of them. They can also send haikus to other users, so there is a fromID and toID for each "send" stored in the sends table. Wherever a user is on any page, all I'm using is a single userID session variable and some occasional GET variables, nothing more.

Focusing on a single page (which really is the meat of the application, since most other pages follow the same model), the index shows users all the haikus that have been written. For each haiku, I display: the title & 3 lines, who wrote it, and whether or not it's one of your favorites. So I need all the haikus that might be your favorites, or might not. I also need to know the total number of results from the query so that I can do pagination, which again, was done from scratch. This is the query that does it all:

1
2
3
4
5
6
SELECT SQL_CALC_FOUND_ROWS h.haikuID AS haikuID, ..., f.userID AS userID
FROM `haikus` AS `h` LEFT JOIN 
  (SELECT * FROM `favorites` WHERE `userID` = '$user') AS `f` 
ON `h`.`haikuID` = `f`.`haikuID`  
ORDER BY `h`.`TIMESTAMP` DESC 
LIMIT $cur, 10

Going from inside out, I am retrieving all of a user's favorites by their user ID with a nested query. I then do a left join of haikus onto this intermediate result using the haikuID field. At this point if I have 3 haikus and one is a favorite of yours, my data might look like this:

haikuID 1 | title 1 | ... |
haikuID 2 | title 2 | ... | userID 49
haikuID 3 | title 3 | ... |

See, if the userID is in the row, that means that row (haiku) joined with a row in the favorites table… it's one of the user's favorites. Otherwise, that field is blank. The rest is just ordering the results and limiting the set to 10, but there's also that one awesome feature: SQL_CALC_FOUND_ROWS. This function tells MySQL to figure out the total number of results I would have retrieved if there were no LIMIT clause in the query. Is it an intensive function? Certainly! It is, however, lighter than doing two separate queries (one to count and another to retrieve the LIMIT set), so in the interest of getting all this data direct from MySQL, this is the right way to do it.

A more complicated query would be for the "haiku sent to me" page, where I need to join all three tables to produce the result (and do the pagination too). Here's the query:

1
2
3
4
5
6
7
8
9
10
SELECT SQL_CALC_FOUND_ROWS s.haikuID AS haikuID, ..., f.userID AS userID
FROM 
  (SELECT * FROM `sends`, `haikus` 
    WHERE `sends`.`toID` = '$user' AND `sends`.`haikuID` = `haikus`.`haikuID` 
  ) AS s 
  LEFT JOIN 
  (SELECT * FROM `favorites` WHERE `userID` = '$user') AS `f` 
ON `s`.`haikuID` = `f`.`haikuID` 
ORDER BY `s`.`TIMESTAMP` DESC 
LIMIT $cur, 10

Looking at this now, this query could be further optimized. I'm doing a cross-join on sends and haikus to get all the haiku sent to the userID, but cross-joins are heavy. This nested query is a "Theta" style join, which is now discouraged in favor of the more standard "ANSI" style joins (the kinds that use JOIN … ON/USING). From Jelly and Custard:

"Theta style joins in general (and in my experience) are notably resource intenstive (sic) compared to ANSI equivalents."

What I should do is:

1
2
3
4
5
6
7
8
9
10
(
 SELECT s.haikuID AS haikuID, ... s.fromID AS fromID FROM 
 (SELECT * FROM `sends`
  WHERE `sends`.`toID` = '$user') AS `s` 
 INNER JOIN `haikus`
 ON `s`.`haikuID` = `haikus`.`haikuID`
) AS `h` 
LEFT JOIN 
(SELECT * FROM `favorites` WHERE `userID` = '$user') AS `f` 
ON `h`.`haikuID` = `f`.`haikuID`

I'm not promising that's correct… I'll have to test it out later. Update: this works perfectly, awesome, gotta love that triple nesting action.

Why do I care so much about optimizing these queries? Because MySQL is the first place I want to have all my performance issues taken care of with any database-driven app before I move on to other optimization techniques. Speaking of such, I'm not purporting myself to be an expert here; I'm just blogging about this because I think it's cool that I can do these things in my queries. I am interested in any other optimizations people might have, so if you happen to be a SQL expert, let me know.

Get a Trackback link

No Comments Yet

Use the comment form to leave a comment here.

Leave a comment

Use Markdown or basic HTML. For posting code, use Postable. Please keep comments respectful and on topic.