Member-only story

Mistral Instruct 7B Finetuning on MedMCQA Dataset

Finetuning Mistral Instruct 7B on Google Colab using QLoRA

MistralAI’s Mistral Instruct 7B is one of the most popular open-source Large Language Models (LLMs). It has achieved SOTA performance on many benchmarks as compared to its 7B counterparts. In this post, I’ll mention the steps required to build an LLM which can solve medical entrance exam questions. We’ll be finetuning Mistral Instruct 7B on the MedMCQA dataset and providing the comparison between the original baseline model and the finetuned model.

Image generated using DALL-E

MedMCQA is a large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real world medical entrance exam questions. It has more than 194k high-quality and diverse AIIMS & NEET PG entrance exam MCQs covering around 2.4k healthcare topics and 21 medical subjects. More information about the dataset is available here — medmcqa/medmcqa: A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address real world medical entrance exam questions. (github.com).

Due to GPU and memory constraints on Google Colab, we’ll use GPTQ (post-training quantized) version of Mistral Instruct 7B from HuggingFace. TheBloke has quantized Mistral Instruct 7B with GPTQ on the HuggingFace Hub. We will then use the parameter efficient LoRA technique to finetune Mistral 7B. This will keep the memory consumption under check.

If you are unaware of Mistral 7B or any of these terms like GPTQ or LoRA, I suggest you go through the following article —

You can refer this article for in-depth understanding of concepts in LLMs. It contains a curated list of some of the important papers and quality articles published online on LLMs.

Let’s first install the following libraries.

!pip install -q accelerate peft bitsandbytes
!pip install -q git+https://github.com/huggingface/transformers
!pip install -q trl py7zr auto-gptq optimum
from google.colab import

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Saankhya Mondal
Saankhya Mondal

Written by Saankhya Mondal

Data Scientist @ Meesho, M. Tech in AI, IISc, Bengaluru.

Responses (5)

Write a response

Thanks for your nice article! I have a question. Since you are using LoRA for finetuning, basically you're trianing an 'adapter' instead of updating the original weights, right? But after you have finetuned the adapter you just pushed the fintuned…

Thanks for this great article. I have a question. What happens if we fine-tune the model to only produce the answer key instead of the entire text? Obviously it solves the matching problem, but do you know whether it affects accuracy?

Nice Article! very helpful. Please specify the package versions you use. In the future packages change, leading to unexpected errors.
if you are implementing this article =>2024 peft==0.6.2 will fix "ValueError: Attempting to unscale FP16 gradients."…