Fine-tuning Mistral-7B LLM using QLoRA in Axolotl

Using the Axolotl library to fine-tune an LLM with Quantized Low-Rank Adaptation requires providing the training data in a specific format and filling out a massive configuration file. This article shows how to do it.

Table of Contents

Login to Hugging Face
The LLM model we fine-tune, and the task
Preparing the training data as JSONL file
Preparing the configuration file
1. Explanation of the configuration file
Fine-tuning the model
Using the fine-tuned model
1. Go From AI Janitor to AI Architect

All the code below assumes that you have logged in to Hugging Face and have a valid access token with read permissions for the mistralai/Mistral-7B-v0.1 model and write access to your private repository, where we will store the fine-tuned model.

The LLM model we fine-tune, and the task

I choose to use the Mistral-7B model for text classification. The model classifies news into four categories: World, Sports, Business, and Sci/Tech (as in the fancyzhx/ag_news dataset). When we pass the task to the not-tuned Mistral model, it fails miserably.

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="mistralai/Mistral-7B-v0.1",
    device=0
 )

prompt = """Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech

Bangladesh paralysed by strikes..."""
output = pipe(prompt, max_length=50, num_return_sequences=1)

Without fine-tuning, the model tends to repeat the given text. Ocassionally, it includes a category name, but it’s not consistent.

Preparing the training data as JSONL file

Axolotl supports several dataset formats. One of them is JSONL files. In those files, each line is a JSON object. The JSON object has a predefined format, too. For example, the Axolotl’s alpaca format looks like this:

{"instruction": "...", "input": "...", "output": "..."}

We will use the alpaca format for our task.

Because I don’t have a dataset I can use for training and show publicly in this article, let’s use the fancyzhx/ag_news dataset as our base and convert it to the required format.

from datasets import load_dataset, DatasetDict


dataset_dict = load_dataset("fancyzhx/ag_news")
train_dataset = dataset_dict["train"]

The dataset contains two columns: text and label. We need a mapping function for each row so we can:

add the instruction key with the same value everywhere
change the text key to input
add the output key with the string value of the label (we must map the integer labels to strings)

def map_dataset(ds):
    instruction_prompt = "Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech"
    new_column = [instruction_prompt] * len(ds)

    ds = ds.rename_column("text", "input")
    ds = ds.add_column("instruction", new_column)

    def map_category(example):
        category_map = {
            0: "World",
          1: "Sports",
            2: "Business",
            3: "Sci_Tech"
        }
        example['output'] = category_map.get(example['label'], "Unknown")
        return example

    ds = ds.map(map_category)
    ds = ds.remove_columns("label")
    return ds

We apply the function to the training dataset and store the result as a JSONL file.

import json


train_dataset = map_dataset(train_dataset)

def format_example(example):
    return {
        "instruction": example['instruction'],
        "input": example['input'],
        "output": example['output']
 }

formatted_dataset = train_dataset.map(format_example)
formatted_dataset.to_json("data.jsonl", orient="records", lines=True)

After running the code above, we have a data.jsonl file in the current directory.

Preparing the configuration file

We need to create a configuration file telling Axolotl how to fine-tune the model. I stored the file’s content in the config.yaml file in the current directory.

base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
 - path: data.jsonl
   ds_type: json
   type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./outputs/qlora-out

model_config:
  output_router_logits: true

adapter: qlora
lora_model_dir:

sequence_len: 1024
sample_packing: false
pad_to_sequence_len: false

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 5

hub_model_id: mikulskibartosz/mistral_axolotl
hub_strategy: checkpoint

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
weight_decay: 0.0
fsdp:
special_tokens:
  bos_token: '<s>'
  eos_token: '</s>'
  unk_token: '<unk>'

Explanation of the configuration file

To get the information about all parameters, you should check the Axolotl documentation.

Below is the explanation of the parameters I had to modify for my fine-tuning task.

base_model - the model we want to fine-tune
model_type - the model type we want to fine-tune. In our case, a causal language model. The type of the model defined here corresponds to the Hugging Face model class: https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForCausalLM
tokenizer_type - the corresponding tokenizer type. (also a Hugging Face class: https://huggingface.co/docs/transformers/v4.46.0/en/model_doc/llama2#transformers.LlamaTokenizer)
load_in_4bit - quantize the model down to 4 bits
datasets - the path to the training data (path), the file type (ds_type), and the format of the data inside the file (type). It can be a list of datasets.
adapter - the type of the adapter we want to use. Here, we use QLoRA. If blank, we would fine-tune the whole model.
All parameters that start with lora_ are related to the LoRA technique.
loss_watchdog_threshold and loss_watchdog_patience are used to stop the training if the loss is not decreasing for a number of epochs specified by loss_watchdog_patience. I had to increase the patience parameter.
hub_model_id - the ID of the Hugging Face repository where we will store the fine-tuned model.
hub_strategy - when set to checkpoint, each checkpoint will be pushed to the Hub (we will also get the final model at the end of the training).
special_tokens - those are the tokens used by the model. Check the model’s documentation for the values you need to set.

Fine-tuning the model

Fine-tuning consists of two steps: preparing the data and running the fine-tuning.

CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess config.yml
accelerate launch -m axolotl.cli.train config.yml

When the task is finished, the final version of the adapter will be pushed to the Hub.

We can merge the adapter with the base model. If you want to do it, check the merging documentation of Axolotl. Remember to upload the merged model to the Hub!

Using the fine-tuned model

We could have merged the adapter into the model, but we didn’t. Therefore, when we want to use the fine-tuned version, we must load the base model, the adapter, and the tokenizer.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig


base_model_name = "mistralai/Mistral-7B-v0.1"
adapter_model_name = "mikulskibartosz/mistral_axolotl"

model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

model = model.to("cuda")
model.eval()

Now, we build a text-generation pipeline with the fine-tuned model. When we run the pipeline, the model properly returns the news category with no additional text.

from transformers import pipeline


text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device="cuda"
)

prompt = "New iPad released..."
output = text_generator(prompt, max_length=10, num_return_sequences=1)

Go From AI Janitor to AI Architect

Stop debugging unpredictable AI systems. My course provides the complete framework to build, measure, and deploy reliable, production-grade AI applications that don't hallucinate.

View The Course Details