Fine-tuning Mistral-7B LLM using QLoRA in Axolotl

Using the Axolotl library to fine-tune an LLM with Quantized Low-Rank Adaptation requires providing the training data in a specific format and filling out a massive configuration file. This article shows how to do it.

Table of Contents

  1. Login to Hugging Face
  2. The LLM model we fine-tune, and the task
  3. Preparing the training data as JSONL file
  4. Preparing the configuration file
    1. Explanation of the configuration file
  5. Fine-tuning the model
  6. Using the fine-tuned model

Login to Hugging Face

All the code below assumes that you have logged in to Hugging Face and have a valid access token with read permissions for the mistralai/Mistral-7B-v0.1 model and write access to your private repository, where we will store the fine-tuned model.

The LLM model we fine-tune, and the task

I choose to use the Mistral-7B model for text classification. The model classifies news into four categories: World, Sports, Business, and Sci/Tech (as in the fancyzhx/ag_news dataset). When we pass the task to the not-tuned Mistral model, it fails miserably.

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="mistralai/Mistral-7B-v0.1",
    device=0
 )

prompt = """Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech

Bangladesh paralysed by strikes..."""
output = pipe(prompt, max_length=50, num_return_sequences=1)

Without fine-tuning, the model tends to repeat the given text. Ocassionally, it includes a category name, but it’s not consistent.

Preparing the training data as JSONL file

Axolotl supports several dataset formats. One of them is JSONL files. In those files, each line is a JSON object. The JSON object has a predefined format, too. For example, the Axolotl’s alpaca format looks like this:

{"instruction": "...", "input": "...", "output": "..."}

We will use the alpaca format for our task.

Because I don’t have a dataset I can use for training and show publicly in this article, let’s use the fancyzhx/ag_news dataset as our base and convert it to the required format.

from datasets import load_dataset, DatasetDict


dataset_dict = load_dataset("fancyzhx/ag_news")
train_dataset = dataset_dict["train"]

The dataset contains two columns: text and label. We need a mapping function for each row so we can:

  • add the instruction key with the same value everywhere
  • change the text key to input
  • add the output key with the string value of the label (we must map the integer labels to strings)
def map_dataset(ds):
    instruction_prompt = "Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech"
    new_column = [instruction_prompt] * len(ds)

    ds = ds.rename_column("text", "input")
    ds = ds.add_column("instruction", new_column)

    def map_category(example):
        category_map = {
            0: "World",
          1: "Sports",
            2: "Business",
            3: "Sci_Tech"
        }
        example['output'] = category_map.get(example['label'], "Unknown")
        return example

    ds = ds.map(map_category)
    ds = ds.remove_columns("label")
    return ds

We apply the function to the training dataset and store the result as a JSONL file.

import json


train_dataset = map_dataset(train_dataset)

def format_example(example):
    return {
        "instruction": example['instruction'],
        "input": example['input'],
        "output": example['output']
 }

formatted_dataset = train_dataset.map(format_example)
formatted_dataset.to_json("data.jsonl", orient="records", lines=True)

After running the code above, we have a data.jsonl file in the current directory.

Preparing the configuration file

We need to create a configuration file telling Axolotl how to fine-tune the model. I stored the file’s content in the config.yaml file in the current directory.

base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
 - path: data.jsonl
   ds_type: json
   type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./outputs/qlora-out

model_config:
  output_router_logits: true

adapter: qlora
lora_model_dir:

sequence_len: 1024
sample_packing: false
pad_to_sequence_len: false

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 5

hub_model_id: mikulskibartosz/mistral_axolotl
hub_strategy: checkpoint

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
weight_decay: 0.0
fsdp:
special_tokens:
  bos_token: '<s>'
  eos_token: '</s>'
  unk_token: '<unk>'

Explanation of the configuration file

To get the information about all parameters, you should check the Axolotl documentation.

Below is the explanation of the parameters I had to modify for my fine-tuning task.

  • base_model - the model we want to fine-tune
  • model_type - the model type we want to fine-tune. In our case, a causal language model. The type of the model defined here corresponds to the Hugging Face model class: https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForCausalLM
  • tokenizer_type - the corresponding tokenizer type. (also a Hugging Face class: https://huggingface.co/docs/transformers/v4.46.0/en/model_doc/llama2#transformers.LlamaTokenizer)
  • load_in_4bit - quantize the model down to 4 bits
  • datasets - the path to the training data (path), the file type (ds_type), and the format of the data inside the file (type). It can be a list of datasets.
  • adapter - the type of the adapter we want to use. Here, we use QLoRA. If blank, we would fine-tune the whole model.
  • All parameters that start with lora_ are related to the LoRA technique.
  • loss_watchdog_threshold and loss_watchdog_patience are used to stop the training if the loss is not decreasing for a number of epochs specified by loss_watchdog_patience. I had to increase the patience parameter.
  • hub_model_id - the ID of the Hugging Face repository where we will store the fine-tuned model.
  • hub_strategy - when set to checkpoint, each checkpoint will be pushed to the Hub (we will also get the final model at the end of the training).
  • special_tokens - those are the tokens used by the model. Check the model’s documentation for the values you need to set.

Fine-tuning the model

Fine-tuning consists of two steps: preparing the data and running the fine-tuning.

CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess config.yml
accelerate launch -m axolotl.cli.train config.yml

When the task is finished, the final version of the adapter will be pushed to the Hub.

We can merge the adapter with the base model. If you want to do it, check the merging documentation of Axolotl. Remember to upload the merged model to the Hub!

Using the fine-tuned model

We could have merged the adapter into the model, but we didn’t. Therefore, when we want to use the fine-tuned version, we must load the base model, the adapter, and the tokenizer.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig


base_model_name = "mistralai/Mistral-7B-v0.1"
adapter_model_name = "mikulskibartosz/mistral_axolotl"

model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

model = model.to("cuda")
model.eval()

Now, we build a text-generation pipeline with the fine-tuned model. When we run the pipeline, the model properly returns the news category with no additional text.

from transformers import pipeline


text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device="cuda"
)

prompt = "New iPad released..."
output = text_generator(prompt, max_length=10, num_return_sequences=1)

Do you need help building AI-powered applications for your business?
You can hire me!

Older post

Prompt Management and Request Tracking for LLM Applications Using Langfuse

Learn how to use Langfuse to manage prompts and track LLM requests in your AI applications. Discover how to version prompts, monitor usage, and improve your LLM applications with detailed analytics.

Newer post

Building Safer AI Systems with Content Moderation - an example with LLama-Guard

A comprehensive guide to implementing AI content moderation with LLama-Guard to build safer chatbots. Learn how to prevent inappropriate responses, protect brand reputation, and handle user interactions responsibly with practical Python examples.

Are you looking for an experienced AI consultant? Do you need assistance with your RAG or Agentic Workflow?
Schedule a call, send me a message on LinkedIn. Schedule a call or send me a message on LinkedIn

>