Pandas stack and unstack explained

In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.


Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.

We have created a data frame indexed by the faculty name and the group id.

import pandas as pd

data = pd.DataFrame([
    ['Mathematics', 'A', 34, True],
    ['Mathematics', 'B', 27, False],
    ['Mathematics', 'C', 29, True],
    ['Chemistry', 'A', 22, True],
    ['Chemistry', 'B', 18, False],
    ['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])

data = data.set_index(['faculty', 'group_id'])
# Output:

#                      number_of_students  exam_scheduled
#faculty     group_id
#Mathematics A                         34            True
#            B                         27           False
#            C                         29            True
#Chemistry   A                         22            True
#            B                         18           False
#            C                         25            True

Turn an index into column

When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.


# Output:

#            number_of_students         exam_scheduled
#group_id                     A   B   C              A      B     C
#Chemistry                   22  18  25           True  False  True
#Mathematics                 34  27  29           True  False  True

Now, when we want to get the number of students in group A, we have to write:


# Output:

#Chemistry      22
#Mathematics    34
#Name: A, dtype: int64

As you see, we get the number of students in every faculty that has a group identified by “A.”

Turn a data frame into a nested lookup table

The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.


# Output:

#aculty      group_id
#Mathematics  A         number_of_students       34
#                       exam_scheduled         True
#             B         number_of_students       27
#                       exam_scheduled        False
#             C         number_of_students       29
#                       exam_scheduled         True
#Chemistry    A         number_of_students       22
#                       exam_scheduled         True
#             B         number_of_students       18
#                       exam_scheduled        False
#             C         number_of_students       25
#                       exam_scheduled         True
#dtype: object

Because of that, we can get the values from stacked data frame like this:


# Output:

Older post

How to split a data frame into time-series for LSTM deep neural network

How to prepare data for LSTM model

Newer post

Loading tensorflow models from Amazon S3 with Tensorflow Serving

How to save the model in a file, upload it to S3, and serve it using the Docker image of Tensorflow Serving