Pandas stack and unstack explained

In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.

Input

Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.

We have created a data frame indexed by the faculty name and the group id.

import pandas as pd

data = pd.DataFrame([
    ['Mathematics', 'A', 34, True],
    ['Mathematics', 'B', 27, False],
    ['Mathematics', 'C', 29, True],
    ['Chemistry', 'A', 22, True],
    ['Chemistry', 'B', 18, False],
    ['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])

data = data.set_index(['faculty', 'group_id'])
print(data)
# Output:

#                      number_of_students  exam_scheduled
#faculty     group_id
#Mathematics A                         34            True
#            B                         27           False
#            C                         29            True
#Chemistry   A                         22            True
#            B                         18           False
#            C                         25            True

Turn an index into column

When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.

data.unstack()

# Output:

#            number_of_students         exam_scheduled
#group_id                     A   B   C              A      B     C
#faculty
#Chemistry                   22  18  25           True  False  True
#Mathematics                 34  27  29           True  False  True

Now, when we want to get the number of students in group A, we have to write:

data.unstack()['number_of_students']['A']

# Output:

#faculty
#Chemistry      22
#Mathematics    34
#Name: A, dtype: int64

As you see, we get the number of students in every faculty that has a group identified by “A.”

Turn a data frame into a nested lookup table

The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.

data.stack()

# Output:

#aculty      group_id
#Mathematics  A         number_of_students       34
#                       exam_scheduled         True
#             B         number_of_students       27
#                       exam_scheduled        False
#             C         number_of_students       29
#                       exam_scheduled         True
#Chemistry    A         number_of_students       22
#                       exam_scheduled         True
#             B         number_of_students       18
#                       exam_scheduled        False
#             C         number_of_students       25
#                       exam_scheduled         True
#dtype: object

Because of that, we can get the values from stacked data frame like this:

data.stack()['Mathematics']['A']['number_of_students']

# Output:

#34
Older post

How to split a data frame into time-series for LSTM deep neural network

How to prepare data for LSTM model

Newer post

Loading tensorflow models from Amazon S3 with Tensorflow Serving

How to save the model in a file, upload it to S3, and serve it using the Docker image of Tensorflow Serving

Are you looking for an experienced AI consultant? Do you need assistance with your RAG or Agentic Workflow?
Schedule a call, send me a message on LinkedIn, or use the chat button in the right-bottom corner. Schedule a call or send me a message on LinkedIn

>