In Pandas, we have multiple methods of selecting the data. Let’s take a look at the four most popular ones.
Table of Contents
We will start with a DataFrame containing five rows:
col_A | col_B | |
---|---|---|
0 | 1 | A |
1 | 2 | B |
2 | 3 | C |
3 | 4 | D |
4 | 5 | E |
the loc function
First, we will use the loc
function. loc
lets us select rows using the DataFrame index. For example, if we write data.loc[[0,1,4]]
, we will get the first, the second, and the last row of our DataFrame.
col_A | col_B | |
---|---|---|
0 | 1 | A |
1 | 2 | B |
4 | 5 | E |
Of course, it’s difficult to spot the benefit of using the loc
function when we have a numeric index. Because of that, we will set the col_B
column as the index and use its values to select the rows:
data.set_index('col_B').loc[['A', 'B', 'E']]
col_B | col_A |
---|---|
A | 1 |
B | 2 |
E | 5 |
the iloc function
Similarly to loc
with a numeric index, we can use the iloc
function to retrieve rows using their position in the DataFrame. Let’s retrieve the last two rows:
data.iloc[[3,4]]
col_A | col_B | |
---|---|---|
3 | 4 | D |
4 | 5 | E |
Get Weekly AI Implementation Insights
Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.
Using a binary mask
In Pandas, we can pass a binary array to the DataFrame selector to retrieve the corresponding rows.
We are going to need an array of bool values. The array must have the same length as our DataFrame.
binary = [True, False, True, True, False]
data[binary]
col_A | col_B | |
---|---|---|
0 | 1 | A |
2 | 3 | C |
3 | 4 | D |
The most popular data selection method involves generating the binary array using the values from the DataFrame. For example, we can retrieve the rows in which col_A
has values smaller than 3:
data[data['col_A'] < 3]
col_A | col_B | |
---|---|---|
0 | 1 | A |
1 | 2 | B |
Slicing a DataFrame
Finally, we can use the slicing operation that works like the same operation in Python lists.
data[2:3]
col_A | col_B | |
---|---|---|
2 | 3 | C |
data[:2]
col_A | col_B | |
---|---|---|
0 | 1 | A |
1 | 2 | B |
data[1:]
col_A | col_B | |
---|---|---|
1 | 2 | B |
2 | 3 | C |
3 | 4 | D |
4 | 5 | E |
data[::2]
col_A | col_B | |
---|---|---|
0 | 1 | A |
2 | 3 | C |
4 | 5 | E |