---
title: "Fine-tuning Mistral-7B LLM using QLoRA in Axolotl"
description: "A comprehensive tutorial on fine-tuning Mistral-7B using QLoRA and Axolotl, covering data preparation, model configuration, and text classification optimization."
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/fine-tuning-llm-using-qlora-in-axolotl
---

Using the Axolotl library to fine-tune an LLM with Quantized Low-Rank Adaptation requires providing the training data in a specific format and filling out a massive configuration file. This article shows how to do it. If you're looking for a lighter-weight solution, you might also consider [fine-tuning smaller language models](https://mikulskibartosz.name/fine-tune-small-language-model) which are faster and cheaper to train.

## Login to Hugging Face

All the code below assumes that you have logged in to Hugging Face and have a valid access token with read permissions for the `mistralai/Mistral-7B-v0.1` model and write access to your private repository, where we will store the fine-tuned model.

## The LLM model we fine-tune, and the task

I choose to use the Mistral-7B model for text classification. The model classifies news into four categories: World, Sports, Business, and Sci/Tech (as in the `fancyzhx/ag_news` dataset). When we pass the task to the not-tuned Mistral model, it fails miserably.

```python
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="mistralai/Mistral-7B-v0.1",
    device=0
 )

prompt = """Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech

Bangladesh paralysed by strikes..."""
output = pipe(prompt, max_length=50, num_return_sequences=1)
```

Without fine-tuning, the model tends to repeat the given text. Ocassionally, it includes a category name, but it's not consistent.

## Preparing the training data as JSONL file

Axolotl supports several dataset formats. One of them is JSONL files. In those files, each line is a JSON object. The JSON object has a predefined format, too. For example, the Axolotl's `alpaca` format looks like this:

```json
{"instruction": "...", "input": "...", "output": "..."}
```

We will use the `alpaca` format for our task.

Because I don't have a dataset I can use for training and show publicly in this article, let's use the `fancyzhx/ag_news` dataset as our base and convert it to the required format.

```python
from datasets import load_dataset, DatasetDict

dataset_dict = load_dataset("fancyzhx/ag_news")
train_dataset = dataset_dict["train"]
```

The dataset contains two columns: `text` and `label`. We need a mapping function for each row so we can:

* add the `instruction` key with the same value everywhere
* change the `text` key to `input`
* add the `output` key with the string value of the label (we must map the integer labels to strings)

```python
def map_dataset(ds):
    instruction_prompt = "Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech"
    new_column = [instruction_prompt] * len(ds)

    ds = ds.rename_column("text", "input")
    ds = ds.add_column("instruction", new_column)

    def map_category(example):
        category_map = {
            0: "World",
          1: "Sports",
            2: "Business",
            3: "Sci_Tech"
        }
        example['output'] = category_map.get(example['label'], "Unknown")
        return example

    ds = ds.map(map_category)
    ds = ds.remove_columns("label")
    return ds
```

We apply the function to the training dataset and store the result as a JSONL file.

```python
import json

train_dataset = map_dataset(train_dataset)

def format_example(example):
    return {
        "instruction": example['instruction'],
        "input": example['input'],
        "output": example['output']
 }

formatted_dataset = train_dataset.map(format_example)
formatted_dataset.to_json("data.jsonl", orient="records", lines=True)
```

After running the code above, we have a `data.jsonl` file in the current directory.

## Preparing the configuration file

We need to create a configuration file telling Axolotl how to fine-tune the model. I stored the file's content in the `config.yaml` file in the current directory.

```yaml
base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
 - path: data.jsonl
   ds_type: json
   type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./outputs/qlora-out

model_config:
  output_router_logits: true

adapter: qlora
lora_model_dir:

sequence_len: 1024
sample_packing: false
pad_to_sequence_len: false

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 5

hub_model_id: mikulskibartosz/mistral_axolotl
hub_strategy: checkpoint

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
weight_decay: 0.0
fsdp:
special_tokens:
  bos_token: '<s>'
  eos_token: '</s>'
  unk_token: '<unk>'
```

### Explanation of the configuration file

To get the information about all parameters, you should check [the Axolotl documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html).

Below is the explanation of the parameters I had to modify for my fine-tuning task.

* `base_model` - the model we want to fine-tune
* `model_type` - the model type we want to fine-tune. In our case, a causal language model. The type of the model defined here corresponds to the Hugging Face model class: https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForCausalLM
* `tokenizer_type` - the corresponding tokenizer type. (also a Hugging Face class: https://huggingface.co/docs/transformers/v4.46.0/en/model_doc/llama2#transformers.LlamaTokenizer)
* `load_in_4bit` - quantize the model down to 4 bits
* `datasets` - the path to the training data (`path`), the file type (`ds_type`), and the format of the data inside the file (`type`). It can be a list of datasets.
* `adapter` - the type of the adapter we want to use. Here, we use QLoRA. If blank, we would fine-tune the whole model.
* All parameters that start with `lora_` are related to the LoRA technique.
* `loss_watchdog_threshold` and `loss_watchdog_patience` are used to stop the training if the loss is not decreasing for a number of epochs specified by `loss_watchdog_patience`. I had to increase the patience parameter.
* `hub_model_id` - the ID of the Hugging Face repository where we will store the fine-tuned model.
* `hub_strategy` - when set to `checkpoint`, each checkpoint will be pushed to the Hub (we will also get the final model at the end of the training).
* `special_tokens` - those are the tokens used by the model. Check the model's documentation for the values you need to set.

## Fine-tuning the model

Fine-tuning consists of two steps: preparing the data and running the fine-tuning.

```bash
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess config.yml
accelerate launch -m axolotl.cli.train config.yml
```

When the task is finished, the final version of the adapter will be pushed to the Hub.

We can merge the adapter with the base model. If you want to do it, check the [merging documentation of Axolotl](https://axolotl-ai-cloud.github.io/axolotl/#merge-lora-to-base). Remember to upload the merged model to the Hub!

## Using the fine-tuned model

We could have merged the adapter into the model, but we didn't. Therefore, when we want to use the fine-tuned version, we must load the base model, the adapter, and the tokenizer.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

base_model_name = "mistralai/Mistral-7B-v0.1"
adapter_model_name = "mikulskibartosz/mistral_axolotl"

model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

model = model.to("cuda")
model.eval()
```

Now, we build a text-generation pipeline with the fine-tuned model. When we run the pipeline, the model properly returns the news category with no additional text.

```python
from transformers import pipeline

text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device="cuda"
)

prompt = "New iPad released..."
output = text_generator(prompt, max_length=10, num_return_sequences=1)
```

