By default, Hive does not support passing the columns to GROUP BY or ORDER BY using their positions instead of names. Therefore, when I try to run this query:

SELECT column_A, count(*) FROM table_name GROUP BY 1

I will see an error message telling me that the column_A is not in the GROUP BY expression.

To get the behavior I want, I have to enable the position alias usage before running my query. I can do that by setting the hive.groupby.orderby.position.alias property to true:

SET hive.groupby.orderby.position.alias=true;

SELECT column_A, count(*) FROM table_name GROUP BY 1

Now, Hive is going to correctly recoginize that I want to group by the first column, lookup its name and use that to execute the query.

Want to build AI systems that actually work?

Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.

Want to build AI systems that actually work?

Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.

Older post

How to check whether a regular expression matches a string in Hive

What is the equivalent of Athena/Presto regexp_like in Hive

Newer post

How to define an AWS Athena view using Airflow

How to use the AWSAthenaOperator

Are you looking for an experienced AI consultant? Do you need assistance with your RAG or Agentic Workflow?
Book a Quick Consultation, send me a message on LinkedIn. Book a Quick Consultation or send me a message on LinkedIn

>