By default, Hive does not support passing the columns to GROUP BY or ORDER BY using their positions instead of names. Therefore, when I try to run this query:
Table of Contents
SELECT column_A, count(*) FROM table_name GROUP BY 1
I will see an error message telling me that the column_A
is not in the GROUP BY expression.
To get the behavior I want, I have to enable the position alias usage before running my query. I can do that by setting the hive.groupby.orderby.position.alias
property to true
:
SET hive.groupby.orderby.position.alias=true;
SELECT column_A, count(*) FROM table_name GROUP BY 1
Now, Hive is going to correctly recoginize that I want to group by the first column, lookup its name and use that to execute the query.
Get Weekly AI Implementation Insights
Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.