Yesterday, my friend Liu Bing asked me a question about how to select top 3 values for each person in a database table using SQL. I think this question is interesting and it deserves some thoughts. Here I record down how to solve this issue.
Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person. Also assume there is no duplicate entry in the table, i.e each name/value pair will be different.
How do we implement the above in SQL? Let's take a look the table structure:
CREATE TABLE tbl ( name VARCHAR(20) NOT NULL, value int(11) DEFAULT NULL ); INSERT INTO tbl VALUES ('Pei Kai',5); INSERT INTO tbl VALUES ('Pei Kai',3); INSERT INTO tbl VALUES ('Pi Ke Kai',2); INSERT INTO tbl VALUES ('Pei Kai',10); INSERT INTO tbl VALUES ('Pei Kai',6); INSERT INTO tbl VALUES ('Pi Ke',4); INSERT INTO tbl VALUES ('Pei Kai',20); INSERT INTO tbl VALUES ('Pei Kai',1); INSERT INTO tbl VALUES ('Pi Ke',25); INSERT INTO tbl VALUES ('Pi Ke',3); INSERT INTO tbl VALUES ('Liu Bing',4); INSERT INTO tbl VALUES ('Liu Bing',7);
The data will be like :
mysql> select * from tbl;
+-----------+-------+
| name | value |
+-----------+-------+
| Pei Kai | 5 |
| Pei Kai | 3 |
| Pi Ke Kai | 2 |
| Pei Kai | 10 |
| Pei Kai | 6 |
| Pi Ke | 4 |
| Pei Kai | 20 |
| Pei Kai | 1 |
| Pi Ke | 25 |
| Pi Ke | 3 |
| Liu Bing | 4 |
| Liu Bing | 7 |
+-----------+-------+
Actually the most difficult part here is to get top 3 values for each person instead of getting top 3 values for all people. We need to find out the 3rd largest value for each person and then select the values which is larger than or equal to that value for each person. Since we need to find the values for each person, so we have to have a loop here to loop through each person. In pure SQL, we can rely on correlated query to simulate the for loop.
Below is the complete query which is to get the top 3 values for each person.
SELECT t.name,t.value FROM ( ( SELECT t1.name,t1.value FROM tbl t1 INNER JOIN ( #Below is a correlated query where the outer query simulates a for loop SELECT t2.name,t2.value FROM tbl t2 WHERE 2=( # Here 2 can be replaced by (N-1) to select top N values for each name SELECT COUNT(DISTINCT(t3.value)) FROM tbl t3 WHERE t3.value>t2.value AND t2.name=t3.name ) ) t4 ON t1.name=t4.name WHERE t1.value>=t4.value ) UNION ALL #Below is to select all name/value pairs where each name has less #than 3 values ( SELECT t5.name,t5.value FROM tbl t5 WHERE t5.name IN (SELECT t6.name FROM tbl t6 GROUP BY t6.name HAVING COUNT(1)<3) ) ) t ORDER BY t.name ASC,t.value DESC ;
The test result will be :
+-----------+-------+
| name | value |
+-----------+-------+
| Liu Bing | 7 |
| Liu Bing | 4 |
| Pei Kai | 20 |
| Pei Kai | 10 |
| Pei Kai | 6 |
| Pi Ke | 25 |
| Pi Ke | 4 |
| Pi Ke | 3 |
| Pi Ke Kai | 2 |
+-----------+-------+
The above query can work perfectly if there is no duplicate entry in the table. However, if there are duplicates, there may be issues. It may return some extra values for some people. Since the correlated query above cannot handle the duplicates very well.
Now lets insert two more values:
INSERT INTO tbl VALUES ('Pi Ke',7); INSERT INTO tbl VALUES ('Pi Ke',7);
Then let's run the above query again, the result now becomes:
+-----------+-------+
| name | value |
+-----------+-------+
| Liu Bing | 7 |
| Liu Bing | 4 |
| Pei Kai | 20 |
| Pei Kai | 10 |
| Pei Kai | 6 |
| Pi Ke | 25 |
| Pi Ke | 7 |
| Pi Ke | 7 |
| Pi Ke | 4 |
| Pi Ke Kai | 2 |
+-----------+-------+
Here for Pi Ke, there are 4 records returned.
If you have good solution for the above duplicate issue. Please let us know.