By default, Hive does not support passing the columns to GROUP BY or ORDER BY using their positions instead of names. Therefore, when I try to run this query:
SELECT column_A, count(*) FROM table_name GROUP BY 1
I will see an error message telling me that the column_A
is not in the GROUP BY expression.
To get the behavior I want, I have to enable the position alias usage before running my query. I can do that by setting the hive.groupby.orderby.position.alias
property to true
:
SET hive.groupby.orderby.position.alias=true;
SELECT column_A, count(*) FROM table_name GROUP BY 1
Now, Hive is going to correctly recoginize that I want to group by the first column, lookup its name and use that to execute the query.
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.