In this article, I will show you how to combine two Spark DataFrames that have no common columns.
For example, if we have the two following DataFrames:
val df1 = Seq(
("001","002","003"),
("004","005","006")
).toDF("A","B","C")
val df2 = Seq(
("011","022","033"),
("044","055","066")
).toDF("D","E","F")
The output I want to get looks like this:
+----+----+----+----+----+----+
| A| B| C| D| E| F|
+----+----+----+----+----+----+
| 001| 002| 003|null|null|null|
| 004| 005| 006|null|null|null|
|null|null|null| 011| 022| 033|
|null|null|null| 044| 055| 066|
+----+----+----+----+----+----+
This can be easily achieved by using the full outer join with the condition set to false
:
df1.join(df2, lit(false), "full")
It works because the full outer join takes all rows from both DataFrames, so we end up with all rows, and we use lit(false)
as the joining condition, which ensures that there will be no matches between both DataFrames.
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.