In the previous article, I showed how to group rows into an array or a set in Hive. This blog post demonstrates how to achieve a similar result in MySQL.
As in the aforementioned article, we have the following table:
C1 C2 AAA AA1 AAA AA2 AAA AA2 BBB BB1 BBB BB2 CCC CC1
and we would like to get this result:
AAA AA1, AA2, AA2 BBB BB1, BB2 CCC CC1
In MySQL, instead of
collect_set functions, we are going to use the
GROUP_CONCAT function. We have to group the rows by the C1 column and concatenate the C2 column:
SELECT C1, GROUP_CONCAT(C2 SEPARATOR ', ') FROM some_table GROUP BY C1;
If we wanted to remove the duplicate values, we would have to add the
DISTINCT operation to make the query look like this:
SELECT C1, GROUP_CONCAT(DISTINCT C2 SEPARATOR ', ') FROM some_table GROUP BY C1;
We can use ORDER BY!
In MySQL, sorting the C2 values is trivial. All we have to do is adding the
ORDER BY clause inside the
SELECT C1, GROUP_CONCAT(C2 ORDER BY C2 ASC SEPARATOR ', ') FROM some_table GROUP BY C1;
Beware the memory limit!
GROUP_CONCAT function will not return a string longer than the default value of the
group_concat_max_len parameter (1024 bytes). We can change that by setting the parameter before we execute a select statement that contains the
SET group_concat_max_len = 10240;
If we are not sure what is the necessary length that we need, we have to calculate the sum of C2 lengths and the length of the required number of separators:
SET group_concat_max_len = CAST( (SELECT SUM(LENGTH(C2)) + (COUNT(*) - 1) * LENGTH(', ') FROM some_table GROUP BY C1) AS UNSIGNED );