---
title: "How to sort a Pandas DataFrame by month name"
description: "How to use an ordered categorical variable to sort a Pandas Dataframe by months while displaying their names"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/how-to-sort-pandas-dataframe-by-month-name
---

The alphabetical order makes no sense when we sort a Pandas DataFrame by month names. How can we sort the values by months while keeping the proper order and still having month names in the column?

The simplest option may be to add a column with the month position in the calendar and sort by it. However, to me, it looks like an ugly hack. There must be a better way.

First, let's create a Pandas DataFrame and mix the order of months:

```python
import pandas as pd
df = pd.DataFrame({
    'date': ['2022-08-01', '2022-09-01', '2022-10-01', '2022-11-01', '2022-12-01'],
    'value': [123, 456, 789, 345, 678]
})

df['month'] = pd.to_datetime(df['date']).apply(lambda x: x.month_name())
df = df.sort_values('value')
```

Now, the `df` variable contains out-of-order variables:

```
| date       |   value | month     |
|:-----------|--------:|:----------|
| 2022-08-01 |     123 | August    |
| 2022-11-01 |     345 | November  |
| 2022-09-01 |     456 | September |
| 2022-12-01 |     678 | December  |
| 2022-10-01 |     789 | October   |
```

Let's fix it.

## The wrong way - putting months in the alphabetical order

To verify that we have a problem sorting the months, let's try using the `sort_values` method on the `month` column. It will sort them alphabetically by names. That's probably the worst way to display time-based data.

```python
df.sort_values('month')
```

The result is a disaster:

```
| date       |   value | month     |
|:-----------|--------:|:----------|
| 2022-08-01 |     123 | August    |
| 2022-12-01 |     678 | December  |
| 2022-11-01 |     345 | November  |
| 2022-10-01 |     789 | October   |
| 2022-09-01 |     456 | September |
```

## Sorting months by names in Pandas

To sort the months correctly, we must define the order first. We could create a list of months in the correct order by hand, but using the `date_range` function seems easier:

```python
dates_in_order = pd.date_range(start='2022-01-01', end='2022-12-01', freq='MS')
```

As a result, we have a `DataFrameIndex` with the first days of every month in 2022:

```
DatetimeIndex(['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01',
               '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01',
               '2022-09-01', '2022-10-01', '2022-11-01', '2022-12-01'],
              dtype='datetime64[ns]', freq='MS')
```

Now, we can extract the month name from each date:

```python
months_in_order = dates_in_order.map(lambda x: x.month_name()).to_list()
```

We have created a list of months:

```
['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']
```

Now, let's use the list of month names to define the `month` column as an ordered categorical variable:

```python
df.month = pd.Categorical(
    df.month,
    categories=months_in_order,
    ordered=True
)
```

Our DataFrame is still not sorted correctly, but this time when we use the `sort_values` function, we will get the expected result:

```python
df.sort_values('month')
```

```
| date       |   value | month     |
|:-----------|--------:|:----------|
| 2022-08-01 |     123 | August    |
| 2022-09-01 |     456 | September |
| 2022-10-01 |     789 | October   |
| 2022-11-01 |     345 | November  |
| 2022-12-01 |     678 | December  |
```