---
title: "How to get names of columns with missing values in PySpark"
description: "How to get the names of missing properties for every row in a PySpark Dataframe"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/names-of-columns-with-missing-values-in-pyspark
---

When we do data validation in PySpark, it is common to need all columns' column names with null values. In this article, I show how to get those names for every row in the DataFrame.

First, I assume that we have a DataFrame `df` and an array `all_columns`, which contains the names of the columns we want to validate.

We have to create a column containing an array of strings that denote the column names with null values. Therefore, we have to use the `when` function to check whether the value is null and pass the column names as the literal value. We use the `*` to unpack the array produced by for comprehension into a Spark array:

```python
missing_column_names = array(*[
    when(col(c).isNull(),lit(c)) for c in all_column
])
```

After that, we assign the values to a new column in the DataFrame:

```python
df = df.withColumn("missing_columns", missing_column_names)
```

