This article shows you how to use the window function and random sorting to select a random sample of rows grouped by a column.
Table of Contents
First, we will use the window function to group the rows by a given column and order them randomly. Let’s assume that I have an Athena table called the_table
with a column called column_A
. In this case, the window function looks like this:
ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY RANDOM()) AS rn
I will put that window function in a subquery:
WITH random_order AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY RANDOM()) AS rn
FROM the_table
)
If I want to get 1000 random samples in every group, I have to select the rows with the rn
parameter equal or less than 1000:
WITH random_order AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY RANDOM()) AS rn
FROM the_table
)
SELECT *
FROM random_order
WHERE rn <= 1000
Get Weekly AI Implementation Insights
Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.