May 23, 2023

Getting Started with GPT-2 on Windows 11 and NVIDIA 4090: A Simple Tutorial with Hugging Face

It’s 2023 and the craze for Chat-GPT3 has begun. We have all seen it blow up the last couple of months. I figured its time to tinker with it and see what we can do on a Windows 11 machine with an NVIDIA 4090 card.

I originally had an AMD RX6800 XT card but as everyone knows we need an Nvidia card as most of the libraries are built for CUDA. I was able to get my hands on a 4090 card and decided to write this tutorial to help others get started. The other sticky-wicket is that most of the tutorials out there are for Linux and I am running on Windows.

Don’t get me wrong, I love Linux but for work I primarily use Windows. I want through a lot of older tutorials and found that most of them are outdated and don’t go to much detail for running on Windows.

We will be leveraging Docker to run the GPT-2 model. I have found this to be the easiest way to get started. We will be using the Hugging Face library to run the model.

Pre-requisites

Windows 11
NVIDIA Card
Docker (with WSL2 and support for the --gpus flag) At the time of writing this, I’m using Docker Desktop 4.19.0 (106363)

Step 1 - app.py (GPT2 Hello World)

Create a new directory called gpt2 and create a new file called app.py.

from transformers import pipeline, set_seed

generator = pipeline('text-generation', model='gpt2', device=0)
set_seed(42)
results = generator("Hello, I'm a language model", max_length=30, num_return_sequences=5)

print("Output:", results)

Step 2 - Dockerfile

Create a new file called Dockerfile and add the following contents:

FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime

RUN pip install datasets transformers==4.28.0
RUN pip install datasets
RUN pip install torch

WORKDIR /app
COPY app.py .

CMD ["python", "app.py"]

Step 3 - Build the Docker Image

Open up a terminal and navigate to the gpt2 directory. Then run the following command to build the docker image:

docker build -t gpt2-hugging-face .

Step 4 - Run the Docker Image

Now that we have the docker image built, we can run it with the following command:

docker run --gpus all -it gpt2-hugging-face

Output:

PS> docker run --gpus all -it gpt2-hugging-face
Downloading (…)lve/main/config.json: 100%|███████████████████| 665/665 [00:00<00:00, 7.42MB/s]
Downloading pytorch_model.bin: 100%|███████████████████| 548M/548M [00:06<00:00, 84.8MB/s]
Downloading (…)neration_config.json: 100%|███████████████████| 124/124 [00:00<00:00, 1.17MB/s]
Downloading (…)olve/main/vocab.json: 100%|███████████████████| 1.04M/1.04M [00:00<00:00, 39.8MB/s]
Downloading (…)olve/main/merges.txt: 100%|███████████████████| 456k/456k [00:00<00:00, 54.7MB/s]
Downloading (…)/main/tokenizer.json: 100%|███████████████████| 1.36M/1.36M [00:00<00:00, 60.7MB/s]
/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Output: [{'generated_text': 'Hello, I\'m a language model, so you can\'t define something in any other language. Let me introduce another topic:\n\nThe name "'}, {'generated_text': "Hello, I'm a language model, you know.\n\nThat's right… I have a lot of friends who don't know what I do"}, {'generated_text': "Hello, I'm a language model, not a formal one. I'm more interested in languages than formal models and I'm going to use the formal"}, {'generated_text': "Hello, I'm a language model, which means that if you're a language designer, you need some understanding of the language model so you can build"}, {'generated_text': "Hello, I'm a language model, and now it's time to figure out where I want to focus my efforts.\n\nLet's imagine that"}]

Json Output:

[
  { "generated_text": "Hello, I'm a language model, so you can't define something in any other language. Let me introduce another topic:\n\nThe name \"" },
  { "generated_text": "Hello, I'm a language model, you know.\n\nThat's right… I have a lot of friends who don't know what I do" },
  { "generated_text": "Hello, I'm a language model, not a formal one. I'm more interested in languages than formal models and I'm going to use the formal" },
  { "generated_text": "Hello, I'm a language model, which means that if you're a language designer, you need some understanding of the language model so you can build" },
  { "generated_text": "Hello, I'm a language model, and now it's time to figure out where I want to focus my efforts.\n\nLet's imagine that" }
]

Step 5 - Store Model in Docker Image

This is all well and good but each time we run the docker image it has to download the model. We can store the model in the docker image by adding the following to the Dockerfile:

FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime

RUN pip install datasets transformers==4.28.0
RUN pip install datasets
RUN pip install torch

WORKDIR /app
COPY app.py .

# Download the model
RUN python app.py --download

CMD ["python", "app.py"]

Then we need to update the app.py file to download the model:

import argparse
from transformers import GPT2LMHeadModel, GPT2Tokenizer

parser = argparse.ArgumentParser()
parser.add_argument('--download', action='store_true', help='Flag to download the model')
args = parser.parse_args()

if args.download:
    model_name = 'gpt2'
    model = GPT2LMHeadModel.from_pretrained(model_name)
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)
    model.save_pretrained(model_name)
    tokenizer.save_pretrained(model_name)
    print(f"Model '{model_name}' downloaded successfully.")
    exit(0)

from transformers import pipeline, set_seed

generator = pipeline('text-generation', model='gpt2', device=0)
set_seed(42)
results = generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

print("Output:", results)

Now when we run the docker image it will download the model and store it in the image. 🤗 In the next post, we will look at fine-tuning the model.

The full source code is available on GitHub.