---
title: "How to Speed Up Pandas? Performance Optimization Techniques for Data Analysis"
description: "Performance optimization techniques for pandas DataFrames"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/how-to-speed-up-pandas
---

Update from 2025: Just use [Polars](https://pola.rs/) instead of Pandas.

The Pandas library uses only one core to run the operations, so there is a tremendous opportunity to speed it up even if you continue running the code on a single machine. This blog post lists three libraries you may want to try when you need your Pandas code to run faster.

## Modin

[Modin](https://github.com/modin-project/modin) speeds up Pandas operations by running them on all available CPU cores. Modin re-implements (almost) all of the Pandas functions to vectorize them and distribute them across the CPUs. Because of that, the API does not change, and all we need to do is:

```python
import modin.pandas as pd
```

Of course, some Pandas functions are not implemented yet, but the authors promise around 90% API coverage.

## Swifter

[Swifter](https://github.com/jmcarpenter2/swifter) improves only one Pandas function: the `apply` function, but it makes a huge difference when you use that function. Instead of using a loop to iterate over the content of the DataFrame, it supports three methods of parallelization. It can either run the code on a Dask cluster, use Modin to vectorize operations or run a custom vectorization.

The setup is quite simple:

```python
import pandas as pd
# or
import modin.pandas as pd

import swifter
```

## Dask

Finally, we can run a separate cluster to execute the code. In [Dask](https://docs.dask.org/en/latest/), the setup is not trivial anymore because it requires installing the cluster and a few modifications in the application code. However, it may be worth the effort because we can always scale up the cluster to get better results.

Of course, **if you try to speed up processing a small amount of data (small = fits in memory on a laptop), Dask will not help you**. The overhead of parallelizing the tasks will most likely lead to a longer processing time than running the same code on a laptop.