Imagine that I have a SQL query that calculates the number of rows grouped by some column:

SELECT column_A, count(*) as cnt
FROM table_name
GROUP BY column_A

I want to get the top 10 rows in every group. To do this, I have to use the query above as a subquery and use the ROW_NUMBER() function and a window function. The window function will group the rows by the column_A and order them by the number of rows:

SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
    SELECT column_A, count(*) as cnt
    FROM table_name
    GROUP BY column_A
) t

To get only the top 10 rows, I am going to put all of that in another subquery and use WHERE to select only top values:

SELECT t2.column_A as column_A, t2.cnt as cnt FROM (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
        SELECT column_A, count(*) as cnt
        FROM table_name
        GROUP BY column_A
        ) t
    ) t2
WHERE t2.rank <= 10;

Want to build AI systems that actually work?

Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.

Want to build AI systems that actually work?

Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.

Older post

How to configure both core and spot instances in EMR using Terraform

Use EMR instance group to add spot instances to an EMR cluster

Newer post

Run a command on a remote server using SSH in Airflow

How to use the SSHHook in a PythonOperator to connect to a remote server from Airflow using SSH and execute a command.

Are you looking for an experienced AI consultant? Do you need assistance with your RAG or Agentic Workflow?
Book a Quick Consultation, send me a message on LinkedIn. Book a Quick Consultation or send me a message on LinkedIn

>