In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.
Input
Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.
We have created a data frame indexed by the faculty name and the group id.
import pandas as pd
data = pd.DataFrame([
['Mathematics', 'A', 34, True],
['Mathematics', 'B', 27, False],
['Mathematics', 'C', 29, True],
['Chemistry', 'A', 22, True],
['Chemistry', 'B', 18, False],
['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])
data = data.set_index(['faculty', 'group_id'])
print(data)
# Output:
# number_of_students exam_scheduled
#faculty group_id
#Mathematics A 34 True
# B 27 False
# C 29 True
#Chemistry A 22 True
# B 18 False
# C 25 True
Turn an index into column
When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.
data.unstack()
# Output:
# number_of_students exam_scheduled
#group_id A B C A B C
#faculty
#Chemistry 22 18 25 True False True
#Mathematics 34 27 29 True False True
Now, when we want to get the number of students in group A, we have to write:
data.unstack()['number_of_students']['A']
# Output:
#faculty
#Chemistry 22
#Mathematics 34
#Name: A, dtype: int64
As you see, we get the number of students in every faculty that has a group identified by “A.”
Turn a data frame into a nested lookup table
The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.
data.stack()
# Output:
#aculty group_id
#Mathematics A number_of_students 34
# exam_scheduled True
# B number_of_students 27
# exam_scheduled False
# C number_of_students 29
# exam_scheduled True
#Chemistry A number_of_students 22
# exam_scheduled True
# B number_of_students 18
# exam_scheduled False
# C number_of_students 25
# exam_scheduled True
#dtype: object
Because of that, we can get the values from stacked data frame like this:
data.stack()['Mathematics']['A']['number_of_students']
# Output:
#34