By default, Hive does not support passing the columns to GROUP BY or ORDER BY using their positions instead of names. Therefore, when I try to run this query:
SELECT column_A, count(*) FROM table_name GROUP BY 1
I will see an error message telling me that the
column_A is not in the GROUP BY expression.
To get the behavior I want, I have to enable the position alias usage before running my query. I can do that by setting the
hive.groupby.orderby.position.alias property to
SET hive.groupby.orderby.position.alias=true; SELECT column_A, count(*) FROM table_name GROUP BY 1
Now, Hive is going to correctly recoginize that I want to group by the first column, lookup its name and use that to execute the query.