Select top 3 values from each group in a table with SQL

  Pi Ke        2013-05-23 03:21:25       19,850        0    

Yesterday, my friend Liu Bing asked me a question about how to select top 3 values for each person in a database table using SQL. I think this question is interesting and it deserves some thoughts. Here I record down how to solve this issue.

Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person. Also assume there is no duplicate entry in the table, i.e each name/value pair will be different.

How do we implement the above in SQL? Let's take a look the table structure:

CREATE TABLE tbl (
	name VARCHAR(20) NOT NULL,
	value int(11) DEFAULT NULL
);

INSERT INTO tbl VALUES ('Pei Kai',5);
INSERT INTO tbl VALUES ('Pei Kai',3);
INSERT INTO tbl VALUES ('Pi Ke Kai',2);
INSERT INTO tbl VALUES ('Pei Kai',10);
INSERT INTO tbl VALUES ('Pei Kai',6);
INSERT INTO tbl VALUES ('Pi Ke',4);
INSERT INTO tbl VALUES ('Pei Kai',20);
INSERT INTO tbl VALUES ('Pei Kai',1);
INSERT INTO tbl VALUES ('Pi Ke',25);
INSERT INTO tbl VALUES ('Pi Ke',3);
INSERT INTO tbl VALUES ('Liu Bing',4);
INSERT INTO tbl VALUES ('Liu Bing',7);

The data will be like :

mysql> select * from tbl;
+-----------+-------+
| name      | value |
+-----------+-------+
| Pei Kai   |     5 |
| Pei Kai   |     3 |
| Pi Ke Kai |     2 |
| Pei Kai   |    10 |
| Pei Kai   |     6 |
| Pi Ke     |     4 |
| Pei Kai   |    20 |
| Pei Kai   |     1 |
| Pi Ke     |    25 |
| Pi Ke     |     3 |
| Liu Bing  |     4 |
| Liu Bing  |     7 |
+-----------+-------+

Actually the most difficult part here is to get top 3 values for each person instead of getting top 3 values for all people. We need to find out the 3rd largest value for each person and then select the values which is larger than or equal to that value for each person. Since we need to find the values for each person, so we have to have a loop here to loop through each person. In pure SQL, we can rely on correlated query to simulate the for loop.

Below is the complete query which is to get the top 3 values for each person.

SELECT t.name,t.value
FROM (
	(
		SELECT t1.name,t1.value FROM tbl t1 
		INNER JOIN
		(
			#Below is a correlated query where the outer query simulates a for loop
			SELECT t2.name,t2.value 
			FROM tbl t2
			WHERE 2=(  # Here 2 can be replaced by (N-1) to select top N values for each name
				SELECT COUNT(DISTINCT(t3.value))
				FROM tbl t3 
				WHERE t3.value>t2.value AND t2.name=t3.name
			)
		) t4
		ON t1.name=t4.name 
		WHERE t1.value>=t4.value
	)
	UNION ALL
	#Below is to select all name/value pairs where each name has less 
	#than 3 values
	(
	SELECT t5.name,t5.value FROM tbl t5 
	WHERE t5.name IN (SELECT t6.name 
	FROM tbl t6
	GROUP BY t6.name HAVING COUNT(1)<3)
	)
) t
ORDER BY t.name ASC,t.value DESC
;

The test result will be :

+-----------+-------+
| name      | value |
+-----------+-------+
| Liu Bing  |     7 |
| Liu Bing  |     4 |
| Pei Kai   |    20 |
| Pei Kai   |    10 |
| Pei Kai   |     6 |
| Pi Ke     |    25 |
| Pi Ke     |     4 |
| Pi Ke     |     3 |
| Pi Ke Kai |     2 |
+-----------+-------+

The above query can work perfectly if there is no duplicate entry in the table. However, if there are duplicates, there may be issues. It may return some extra values for some people. Since the correlated query above cannot handle the duplicates very well.

Now lets insert two more values:

INSERT INTO tbl VALUES ('Pi Ke',7);
INSERT INTO tbl VALUES ('Pi Ke',7);

Then let's run the above query again, the result now becomes:

+-----------+-------+
| name      | value |
+-----------+-------+
| Liu Bing  |     7 |
| Liu Bing  |     4 |
| Pei Kai   |    20 |
| Pei Kai   |    10 |
| Pei Kai   |     6 |
| Pi Ke     |    25 |
| Pi Ke     |     7 |
| Pi Ke     |     7 |
| Pi Ke     |     4 |
| Pi Ke Kai |     2 |
+-----------+-------+

Here for Pi Ke, there are 4 records returned.

If you have good solution for the above duplicate issue. Please let us know.

SQL  CORRELATED QUERY  TOP 3 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

Security question in action