---
title: "Use the ROW_NUMBER() function to get top rows by partition in Hive"
description: "How to calculate row number by partition in Hive and use it to filter rows"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/use-row-number-to-get-top-rows-by-partition-in-hive
---

Imagine that I have a SQL query that calculates the number of rows grouped by some column:

```sql
SELECT column_A, count(*) as cnt
FROM table_name
GROUP BY column_A
```

I want to get the top 10 rows in every group. To do this, I have to use the query above as a subquery and use the `ROW_NUMBER()` function and a window function. The window function will group the rows by the column_A and order them by the number of rows:

```sql
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
    SELECT column_A, count(*) as cnt
    FROM table_name
    GROUP BY column_A
) t
```

To get only the top 10 rows, I am going to put all of that in another subquery and use `WHERE` to select only top values:

```sql
SELECT t2.column_A as column_A, t2.cnt as cnt FROM (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
        SELECT column_A, count(*) as cnt
        FROM table_name
        GROUP BY column_A
        ) t
    ) t2
WHERE t2.rank <= 10;
```

