In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.
Table of Contents
Input
Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.
We have created a data frame indexed by the faculty name and the group id.
import pandas as pd
data = pd.DataFrame([
['Mathematics', 'A', 34, True],
['Mathematics', 'B', 27, False],
['Mathematics', 'C', 29, True],
['Chemistry', 'A', 22, True],
['Chemistry', 'B', 18, False],
['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])
data = data.set_index(['faculty', 'group_id'])
print(data)
# Output:
# number_of_students exam_scheduled
#faculty group_id
#Mathematics A 34 True
# B 27 False
# C 29 True
#Chemistry A 22 True
# B 18 False
# C 25 True
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
Turn an index into column
When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.
data.unstack()
# Output:
# number_of_students exam_scheduled
#group_id A B C A B C
#faculty
#Chemistry 22 18 25 True False True
#Mathematics 34 27 29 True False True
Now, when we want to get the number of students in group A, we have to write:
data.unstack()['number_of_students']['A']
# Output:
#faculty
#Chemistry 22
#Mathematics 34
#Name: A, dtype: int64
As you see, we get the number of students in every faculty that has a group identified by “A.”
Turn a data frame into a nested lookup table
The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.
data.stack()
# Output:
#aculty group_id
#Mathematics A number_of_students 34
# exam_scheduled True
# B number_of_students 27
# exam_scheduled False
# C number_of_students 29
# exam_scheduled True
#Chemistry A number_of_students 22
# exam_scheduled True
# B number_of_students 18
# exam_scheduled False
# C number_of_students 25
# exam_scheduled True
#dtype: object
Because of that, we can get the values from stacked data frame like this:
data.stack()['Mathematics']['A']['number_of_students']
# Output:
#34