We can effortlessly visualize the dispersion and skewness of data using the box and whiskers plot.
Table of Contents
import seaborn as sns
data = sns.load_dataset('titanic')
data = data.dropna()
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt
boxplot(data['age'], labels = ['age'])
plt.title("Titanic passenger's age - bars and whiskers")
data:image/s3,"s3://crabby-images/f4879/f4879f656cf5bf604c250a85795513a5e1e1b2eb" alt=""
The plot consists of 3 elements:
-
The line inside the rectangle indicates the median of data.
-
The rectangle shows the interquartile range (IQR). Its lower edge is placed at the 25% percentile (1st quartile). The upper edge is at the 75% percentile (3rd quartile).
-
The T-shaped lines are the whiskers. Normally the range of the whiskers shows values which are between the 1st quartile (Q1) and a number (Q1 — IQR1.5). The upper whisker ends at the value = Q3 + IQR1.5.
In case of this plot, the whiskers end at the minimal and the maximal values.
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
Outliers
If we limit the whiskers range to 1*IQR we will see another part of the plot. The circles indicate outliers.
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt
boxplot(data['age'], whis = 1, labels = ['age'])
plt.title("Titanic passenger's age - bars and whiskers")
data:image/s3,"s3://crabby-images/e162f/e162f1d57a87c4b6cd1e903188f44553c43e36cd" alt=""
We can also limit the whiskers to given percentiles. The plot will display value lower than the n-th percentile and larger than k-th percentile as outliers.
from matplotlib.pyplot import boxplot
import matplotlib.pyplot as plt
boxplot(data['age'], whis = [5, 95], labels = ['age'])
plt.title("Titanic passenger's age - bars and whiskers")
data:image/s3,"s3://crabby-images/c13be/c13be4ac53e3733294016ac8e1c8b7ee0faffa88" alt=""