Using the Axolotl library to fine-tune an LLM with Quantized Low-Rank Adaptation requires providing the training data in a specific format and filling out a massive configuration file. This article shows how to do it.
Table of Contents
- Login to Hugging Face
- The LLM model we fine-tune, and the task
- Preparing the training data as JSONL file
- Preparing the configuration file
- Fine-tuning the model
- Using the fine-tuned model
Login to Hugging Face
All the code below assumes that you have logged in to Hugging Face and have a valid access token with read permissions for the mistralai/Mistral-7B-v0.1
model and write access to your private repository, where we will store the fine-tuned model.
The LLM model we fine-tune, and the task
I choose to use the Mistral-7B model for text classification. The model classifies news into four categories: World, Sports, Business, and Sci/Tech (as in the fancyzhx/ag_news
dataset). When we pass the task to the not-tuned Mistral model, it fails miserably.
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="mistralai/Mistral-7B-v0.1",
device=0
)
prompt = """Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech
Bangladesh paralysed by strikes..."""
output = pipe(prompt, max_length=50, num_return_sequences=1)
Without fine-tuning, the model tends to repeat the given text. Ocassionally, it includes a category name, but it’s not consistent.
Preparing the training data as JSONL file
Axolotl supports several dataset formats. One of them is JSONL files. In those files, each line is a JSON object. The JSON object has a predefined format, too. For example, the Axolotl’s alpaca
format looks like this:
{"instruction": "...", "input": "...", "output": "..."}
We will use the alpaca
format for our task.
Because I don’t have a dataset I can use for training and show publicly in this article, let’s use the fancyzhx/ag_news
dataset as our base and convert it to the required format.
from datasets import load_dataset, DatasetDict
dataset_dict = load_dataset("fancyzhx/ag_news")
train_dataset = dataset_dict["train"]
The dataset contains two columns: text
and label
. We need a mapping function for each row so we can:
- add the
instruction
key with the same value everywhere - change the
text
key toinput
- add the
output
key with the string value of the label (we must map the integer labels to strings)
def map_dataset(ds):
instruction_prompt = "Classify the text as news about one of the four categories: World, Sports, Business, Sci_Tech"
new_column = [instruction_prompt] * len(ds)
ds = ds.rename_column("text", "input")
ds = ds.add_column("instruction", new_column)
def map_category(example):
category_map = {
0: "World",
1: "Sports",
2: "Business",
3: "Sci_Tech"
}
example['output'] = category_map.get(example['label'], "Unknown")
return example
ds = ds.map(map_category)
ds = ds.remove_columns("label")
return ds
We apply the function to the training dataset and store the result as a JSONL file.
import json
train_dataset = map_dataset(train_dataset)
def format_example(example):
return {
"instruction": example['instruction'],
"input": example['input'],
"output": example['output']
}
formatted_dataset = train_dataset.map(format_example)
formatted_dataset.to_json("data.jsonl", orient="records", lines=True)
After running the code above, we have a data.jsonl
file in the current directory.
Preparing the configuration file
We need to create a configuration file telling Axolotl how to fine-tune the model. I stored the file’s content in the config.yaml
file in the current directory.
base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
trust_remote_code: true
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: data.jsonl
ds_type: json
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./outputs/qlora-out
model_config:
output_router_logits: true
adapter: qlora
lora_model_dir:
sequence_len: 1024
sample_packing: false
pad_to_sequence_len: false
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
loss_watchdog_threshold: 5.0
loss_watchdog_patience: 5
hub_model_id: mikulskibartosz/mistral_axolotl
hub_strategy: checkpoint
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
weight_decay: 0.0
fsdp:
special_tokens:
bos_token: '<s>'
eos_token: '</s>'
unk_token: '<unk>'
Explanation of the configuration file
To get the information about all parameters, you should check the Axolotl documentation.
Below is the explanation of the parameters I had to modify for my fine-tuning task.
-
base_model
- the model we want to fine-tune -
model_type
- the model type we want to fine-tune. In our case, a causal language model. The type of the model defined here corresponds to the Hugging Face model class: https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForCausalLM -
tokenizer_type
- the corresponding tokenizer type. (also a Hugging Face class: https://huggingface.co/docs/transformers/v4.46.0/en/model_doc/llama2#transformers.LlamaTokenizer) -
load_in_4bit
- quantize the model down to 4 bits -
datasets
- the path to the training data (path
), the file type (ds_type
), and the format of the data inside the file (type
). It can be a list of datasets. -
adapter
- the type of the adapter we want to use. Here, we use QLoRA. If blank, we would fine-tune the whole model. - All parameters that start with
lora_
are related to the LoRA technique. -
loss_watchdog_threshold
andloss_watchdog_patience
are used to stop the training if the loss is not decreasing for a number of epochs specified byloss_watchdog_patience
. I had to increase the patience parameter. -
hub_model_id
- the ID of the Hugging Face repository where we will store the fine-tuned model. -
hub_strategy
- when set tocheckpoint
, each checkpoint will be pushed to the Hub (we will also get the final model at the end of the training). -
special_tokens
- those are the tokens used by the model. Check the model’s documentation for the values you need to set.
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
Fine-tuning the model
Fine-tuning consists of two steps: preparing the data and running the fine-tuning.
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess config.yml
accelerate launch -m axolotl.cli.train config.yml
When the task is finished, the final version of the adapter will be pushed to the Hub.
We can merge the adapter with the base model. If you want to do it, check the merging documentation of Axolotl. Remember to upload the merged model to the Hub!
Using the fine-tuned model
We could have merged the adapter into the model, but we didn’t. Therefore, when we want to use the fine-tuned version, we must load the base model, the adapter, and the tokenizer.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
base_model_name = "mistralai/Mistral-7B-v0.1"
adapter_model_name = "mikulskibartosz/mistral_axolotl"
model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = model.to("cuda")
model.eval()
Now, we build a text-generation pipeline with the fine-tuned model. When we run the pipeline, the model properly returns the news category with no additional text.
from transformers import pipeline
text_generator = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device="cuda"
)
prompt = "New iPad released..."
output = text_generator(prompt, max_length=10, num_return_sequences=1)
Do you need help building AI-powered applications for your business?
You can hire me!