---
title: "XGBoost hyperparameter tuning in Python using grid search"
description: "Using GridSearchCV from Scikit-Learn to tune XGBoost classifier"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/xgboost-hyperparameter-tuning-in-python-using-grid-search
---

Fortunately, XGBoost implements the scikit-learn API, so tuning its hyperparameters is very easy.

I assume that you have already preprocessed the dataset and split it into training, test dataset, so I will focus only on the tuning part.

First, we have to import XGBoost classifier and GridSearchCV from scikit-learn.

```python
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
```

After that, we have to specify the constant parameters of the classifier. We need the objective. In this case, I use the "binary:logistic" function because I train a classifier which handles only two classes. Additionally, I specify the number of threads to speed up the training, and the seed for a random number generator, to get the same results in every run.

```python
estimator = XGBClassifier(
    objective= 'binary:logistic',
    nthread=4,
    seed=42
)
```

In the next step, I have to specify the tunable parameters and the range of values.

```python
parameters = {
    'max_depth': range (2, 10, 1),
    'n_estimators': range(60, 220, 40),
    'learning_rate': [0.1, 0.01, 0.05]
}
```

In the last setup step, I configure the GridSearchCV object. I choose the best hyperparameters using the <a href="https://www.mikulskibartosz.name/how-to-interpret-roc-curve-and-auc-metrics/">ROC AUC metric</a> to compare the results of 10-fold cross-validation.

```python
grid_search = GridSearchCV(
    estimator=estimator,
    param_grid=parameters,
    scoring = 'roc_auc',
    n_jobs = 10,
    cv = 10,
    verbose=True
)
```

Now, we can do the training.

```python
grid_search.fit(X, Y)
```

Here are the results:

```
Fitting 10 folds for each of 96 candidates, totalling 960 fits
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.0s
[Parallel(n_jobs=10)]: Done 180 tasks      | elapsed:   40.1s
[Parallel(n_jobs=10)]: Done 430 tasks      | elapsed:  1.7min
[Parallel(n_jobs=10)]: Done 780 tasks      | elapsed:  3.1min
[Parallel(n_jobs=10)]: Done 960 out of 960 | elapsed:  4.0min finished
```

The `best_estimator_` field contains the best model trained by GridSearch.

```python
grid_search.best_estimator_
```