Fine-Tuning TinyLlama on WhatsApp Chats: Build Your Own Personal AI Chatbot! ๐
Introduction
Ever wondered what it would be like to have an AI that talks just like you and your friends? What if you could train an AI chatbot on your WhatsApp conversations and make it understand your slang, emotions, and inside jokes? Well, now you can!
In this guide, weโll fine-tune TinyLlama (1.1B Chat model) on WhatsApp chat data to create a personalized AI assistant that mirrors real-life conversations. Weโll use QLoRA (Quantized Low-Rank Adaptation) to make fine-tuning memory efficient โ even on consumer GPUs!
Why Fine-Tune TinyLlama?
- โ Lightweight yet powerful โ Only 1.1B parameters, making it efficient.
- โ Supports conversational AI โ Optimized for chat-based interactions.
- โ Memory-efficient fine-tuning โ Uses QLoRA for better performance on low-resource GPUs.
- โ Customizable โ Fine-tune on your chat data to make AI sound like you.
First, we will evaluate the output of the TinyLlama 1.1B Chat model when loaded without quantization and without any fine-tuning.
Next, we will assess its performance when loaded with quantization and without any fine-tuning.
Step 1: Download and Prepare Your WhatsApp Chat Data
First, you need to extract chat data from WhatsApp and format it properly.
Download the Dataset from Kaggle
You can use an existing WhatsApp-style conversation dataset from Kaggle:
- Dataset URL: Kaggle โ Conversation Dataset
Downloading via Kaggle Hub
Install the KaggleHub:
pip install kagglehub
Download the dataset:
# Download latest version
import kagglehub
path = kagglehub.dataset_download("siddikisahil47/conversation")
Now your dataset is ready for preprocessing and fine-tuning!
Export Chat from WhatsApp (If Using Personal Data)
- Open WhatsApp and go to any chat.
- Tap on More Options (โฎ) โ Export Chat.
- Select Without Media and save the text file.
- Transfer the file to your system.
Preprocessing WhatsApp Chat Data
The raw chat file will look something like this:
Rohan: Hey Radhika! Kaisi ho?
Radhika: Hey Rohan, main bilkul thik hun. Tu bata, kaisa hai?
Rohan: I'm good too, yaar. Tumne suna ki next week school wali trip hai?
We need to convert this into JSONL format (ideal for fine-tuning).
Python Script to Format Data
import json
import re
# Input & Output file paths
input_file = path # Your raw chat file
output_file = "conversations.jsonl" # Processed JSONL file
# Read raw chat file
with open(input_file, "r", encoding="utf-8") as f:
lines = f.readlines()
data = []
current_convo = []
# Regex to match "Name: Message"
message_pattern = re.compile(r"^(.*?):\s(.*)")
for line in lines:
line = line.strip()
if not line:
continue
match = message_pattern.match(line)
if match:
sender, message = match.groups()
if current_convo: # If conversation exists, save previous exchange
data.append({
"instruction": current_convo[0], # Previous message
"response": message # Current response
})
current_convo = [message] # Update with new instruction
# Save as JSONL
with open(output_file, "w", encoding="utf-8") as f:
for item in data:
f.write(json.dumps(item, ensure_ascii=False) + "\n")
print(f"Processed {len(data)} conversations and saved to {output_file}.")
Step 2: Fine-Tune TinyLlama Using QLoRA
Now that we have our dataset ready, letโs fine-tune TinyLlama/TinyLlama-1.1B-Chat-v1.0 using QLoRA.
Install Dependencies
pip install transformers datasets accelerate bitsandbytes peft
Fine-Tuning Script
import torch
import json
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype="float16", # Reduce memory usage
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype=torch.float16, device_map={"": torch.cuda.current_device()},quantization_config=quantization_config
)
dataset = load_dataset("json", data_files={"train": "conversations.jsonl"})
dataset['train'][0]
{'instruction': 'Hey Radhika! Kaisi ho?',
'response': 'Hey Rohan, main bilkul thik hun. Tu bata, kaisa hai?'}
Check Max Length
length = 0
for i in range(len(dataset['train'])):
instruction_length = len(dataset['train'][i]['instruction'])
response_length = len(dataset['train'][i]['response'])
if instruction_length > length:
length = instruction_length
if response_length > length:
length = response_length
print(f"max length : {length}")
max length : 454
Tokenize the dataset
def tokenize_function(examples):
return tokenizer(
[instr + "\n\n" + resp for instr, resp in zip(examples["instruction"], examples["response"])], # Combine prompt-response
padding="max_length",
truncation=True,
max_length=480 # Adjust based on your needs
)
# Apply tokenization
tokenized_datasets = dataset.map(tokenize_function, batched=True)
Map: 0%| | 0/33287 [00:00<?, ? examples/s]
Map: 100%|โโโโโโโโโโ| 33287/33287 [00:10<00:00, 3305.48 examples/s]
lora_config = LoraConfig(
r=8, # Low-rank adaptation dimension
lora_alpha=16,
lora_dropout=0.1,
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
# === Step 4: Define Training Arguments ===
training_args = TrainingArguments(
output_dir="./tinyllama-finetuned-whatsapp",
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
num_train_epochs=2,
learning_rate=2e-4,
evaluation_strategy="epoch",
save_strategy="epoch",
save_total_limit=2,
logging_steps=10,
fp16=True, # Use mixed precision for efficiency
report_to="none",
)
split_data = tokenized_datasets["train"].train_test_split(test_size=0.1)
# === Step 5: Initialize Trainer and Train ===
trainer = SFTTrainer(
model=model,
train_dataset=split_data["train"],
eval_dataset=split_data["test"], # โ
Provide validation dataset
args=training_args,
)
trainer.train()
# === Step 6: Save Fine-Tuned Model ===
model.save_pretrained("./tinyllama-finetuned-whatsapp")
tokenizer.save_pretrained("./tinyllama-finetuned-whatsapp")
print("โ
Fine-tuning complete! Model saved successfully.")
Step 3: Merge Adaptor Weights with Base Model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel
# Define model paths
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_path = "tinyllama-finetuned-whatsapp\\checkpoint-11232"
merged_model_path = "tinyllama-finetuned-whatsapp-merged-model"
# Load base tokenizer
base_tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
# Check if fine-tuned model has a modified tokenizer
try:
adapter_tokenizer = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
print("Loaded tokenizer from adapter.")
except:
adapter_tokenizer = base_tokenizer
print("No separate tokenizer found in adapter path, using base tokenizer.")
# Merge tokenizers if new tokens were added
num_added_tokens = base_tokenizer.add_special_tokens({"additional_special_tokens": adapter_tokenizer.additional_special_tokens})
if num_added_tokens > 0:
print(f"Added {num_added_tokens} new tokens from adapter tokenizer.")
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype=torch.float16, device_map="cuda")
# Load adapter
model = PeftModel.from_pretrained(base_model, adapter_path)
# Merge adapter into base model
model = model.merge_and_unload() # Merges LoRA adapter weights into the base model
# Save the merged model and tokenizer
model.save_pretrained(merged_model_path)
base_tokenizer.save_pretrained(merged_model_path)
print(f"Merged model and tokenizer saved to {merged_model_path}")
Step 4: Test Your Fine-Tuned Model
from transformers import GenerationConfig
from time import perf_counter
def formatted_prompt(question)-> str:
return f"<|user|>\n{question}</s>\n<|assistant|>"
def generate_response(user_input):
prompt = formatted_prompt(user_input)
inputs = tokenizer([prompt], return_tensors="pt")
generation_config = GenerationConfig(penalty_alpha=0.6,do_sample = True,
top_k=5,temperature=0.5,repetition_penalty=1.2,
max_new_tokens=50,pad_token_id=tokenizer.eos_token_id
)
start_time = perf_counter()
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
output_time = perf_counter() - start_time
print(f"Time taken for inference: {round(output_time,2)} seconds")
generate_response(user_input='Hey Radhika, kesi ho?')
Conclusion
Congratulations! ๐ You have successfully fine-tuned TinyLlama on WhatsApp chats to create a personalized chatbot that talks just like you! We can fine-tune it more with epochs as I have used only 2.
This method can be used for:
Personal AI Assistants
Customer Support Bots
AI-driven Social Media Interactions