This article shows you how to use the window function and random sorting to select a random sample of rows grouped by a column.
First, we will use the window function to group the rows by a given column and order them randomly. Let’s assume that I have an Athena table called the_table
with a column called column_A
. In this case, the window function looks like this:
ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY RANDOM()) AS rn
I will put that window function in a subquery:
WITH random_order AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY RANDOM()) AS rn
FROM the_table
)
If I want to get 1000 random samples in every group, I have to select the rows with the rn
parameter equal or less than 1000:
WITH random_order AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY RANDOM()) AS rn
FROM the_table
)
SELECT *
FROM random_order
WHERE rn <= 1000
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.