I’m currently programming a who’s online for leads at work, and was having lots of difficulty in getting the results I needed to properly display the most recent activity.
We have three tables that need to be looked up due to the way the schema was designed. The primary table is a user_log table that stores all actions that a user performed along with a timestamp. However, this log table does not include any user information aside from a user_id. So, to get the information about a user, we need to join that user_id to the user_id in a users table.
But, here’s where it gets really messed up – the users table doesn’t contain the information we need. Instead, it has a contact_id field in the users table, which must be then joined to a contacts table.
So it goes like this:
.. WHERE user_log.user_id = users.user_id AND users.contact_id = contacts.contact_id
Yeah, three tables just to get contact information.
I was finding that the standard SQL Distinct/Group By keywords were proving ineffective in getting the results I needed, which would be to just give me one user_id and the most recent created timestamp in the user_log table. Instead, I was getting multiple rows for the same user_id.
Fortunately, I found out about analytic functions in Oracle, which helped me isolate out unique user_ids and get the latest timestamp for that user by using ROW_NUMBER( ) where the sequence_number = 1 (giving me the topmost result for each user_id).
Unfortunately, I also learned that mySQL does not have analytic functions built in, but it can be emulated in native SQL (Part 1) (Part 2).
The best explanation I can give about the common aggregation functions that most people tend to use like SUM(), AVG(), etc vs analytic functions is:
Assume you have a set of rows.
The aggregation function will take that set of rows and output only one row with the results, while an analytic function will output that set of rows with the results appended.
The analytic functions do not have to work this way, but it’s useful for performing ranking analysis or sorting rows into individual sets of results.