Imagine that I have a SQL query that calculates the number of rows grouped by some column:
SELECT column_A, count(*) as cnt
FROM table_name
GROUP BY column_A
I want to get the top 10 rows in every group. To do this, I have to use the query above as a subquery and use the ROW_NUMBER()
function and a window function. The window function will group the rows by the column_A and order them by the number of rows:
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
SELECT column_A, count(*) as cnt
FROM table_name
GROUP BY column_A
) t
To get only the top 10 rows, I am going to put all of that in another subquery and use WHERE
to select only top values:
SELECT t2.column_A as column_A, t2.cnt as cnt FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
SELECT column_A, count(*) as cnt
FROM table_name
GROUP BY column_A
) t
) t2
WHERE t2.rank <= 10;
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.