Fine-tuning LLAMA 3 for Text Classification with Limited Resources



This content originally appeared on DEV Community and was authored by Jun Yamog

I recently needed to classify sentences for a particular use case at work. Remembering Jeremy Howard’s Lesson 4: Getting started with NLP for absolute beginners, I first adapted his notebook to fine-tune DEBERTA.

It worked, but not to my satisfaction, so I was curious what would happen if I used a LLM like LLAMA 3. The problem? Limited GPU resources. I only had access to a Tesla/Nvidia T4 instance.

Research led me to QLORA. This tutorial on Fine tuning LLama 3 LLM for Text Classification of Stock Sentiment using QLoRA was particularly useful. To better understand the tutorial, I adapted Lesson 4 into the QLORA tutorial notebook.

QLORA uses two main techniques:

  1. Quantization: Reduces model precision, making it smaller.
  2. LORA (Low-Rank Adaptation): Adds small, trainable layers instead of fine-tuning the whole model.

This allowed me to train LLAMA 3 8B on a 16GB VRAM T4, using about 12GB of VRAM. The results were surprisingly good, with prediction accuracy over 90%.

Confusion Matrix:
[[83  4]
[ 4  9]]
Classification Report:
              precision    recall  f1-score   support
         0.0       0.95      0.95      0.95        87
         1.0       0.69      0.69      0.69        13
    accuracy                           0.92       100
   macro avg       0.82      0.82      0.82       100
weighted avg       0.92      0.92      0.92       100
Balanced Accuracy Score: 0.8231653404067196
Accuracy Score: 0.92

Here’s the iPython notebook detailing the process.

This approach shows it’s possible to work with large language models on limited hardware. Working with constraints often leads to creative problem-solving and learning opportunities. In this case, the limitations pushed me to explore and implement more efficient fine-tuning techniques.


This content originally appeared on DEV Community and was authored by Jun Yamog