In the previous article, I showed how to group rows into an array or a set in Hive. This blog post demonstrates how to achieve a similar result in MySQL.
Table of Contents
As in the aforementioned article, we have the following table:
C1 C2
AAA AA1
AAA AA2
AAA AA2
BBB BB1
BBB BB2
CCC CC1
and we would like to get this result:
AAA AA1, AA2, AA2
BBB BB1, BB2
CCC CC1
In MySQL, instead of collect_list
or collect_set
functions, we are going to use the GROUP_CONCAT
function. We have to group the rows by the C1 column and concatenate the C2 column:
SELECT C1, GROUP_CONCAT(C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;
If we wanted to remove the duplicate values, we would have to add the DISTINCT
operation to make the query look like this:
SELECT C1, GROUP_CONCAT(DISTINCT C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;
We can use ORDER BY!
In MySQL, sorting the C2 values is trivial. All we have to do is adding the ORDER BY
clause inside the GROUP_CONCAT
function:
SELECT C1, GROUP_CONCAT(C2 ORDER BY C2 ASC SEPARATOR ', ')
FROM some_table
GROUP BY C1;
Beware the memory limit!
Unfortunately, the GROUP_CONCAT
function will not return a string longer than the default value of the group_concat_max_len
parameter (1024 bytes). We can change that by setting the parameter before we execute a select statement that contains the GROUP_CONCAT
function:
SET group_concat_max_len = 10240;
If we are not sure what is the necessary length that we need, we have to calculate the sum of C2 lengths and the length of the required number of separators:
SET group_concat_max_len = CAST(
(SELECT SUM(LENGTH(C2)) + (COUNT(*) - 1) * LENGTH(', ')
FROM some_table
GROUP BY C1)
AS UNSIGNED
);