It's 2023 and the craze for Chat-GPT3 has begun. We have all seen it blow up the last couple of months. I figured its time to tinker with it and see what we can do on a Windows 11 machine with an NVIDIA 4090 card.
I originally had an AMD RX6800 XT card but as everyone knows we need an Nvidia card as most of the libraries are built for CUDA. I was able to get my hands on a 4090 card and decided to write this tutorial to help others get started. The other sticky-wicket is that most of the tutorials out there are for Linux and I am running on Windows.
Don't get me wrong, I love Linux but for work I primarily use Windows. I want through a lot of older tutorials and found that most of them are outdated and don't go to much detail for running on Windows.
We will be leveraging Docker to run the GPT-2 model. I have found this to be the easiest way to get started. We will be using the Hugging Face library to run the model.
--gpus
flag) At the time of writing this, I'm using Docker Desktop 4.19.0 (106363)Create a new directory called gpt2
and create a new file called app.py
.
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2', device=0)
set_seed(42)
results = generator("Hello, I'm a language model", max_length=30, num_return_sequences=5)
print("Output:", results)
Create a new file called Dockerfile
and add the following contents:
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
RUN pip install datasets transformers==4.28.0
RUN pip install datasets
RUN pip install torch
WORKDIR /app
COPY app.py .
CMD ["python", "app.py"]
Open up a terminal and navigate to the gpt2
directory. Then run the following command to build the docker image:
docker build -t gpt2-hugging-face .
Now that we have the docker image built, we can run it with the following command:
docker run --gpus all -it gpt2-hugging-face
Output:
PS> docker run --gpus all -it gpt2-hugging-face
Downloading (…)lve/main/config.json: 100%|███████████████████| 665/665 [00:00<00:00, 7.42MB/s]
Downloading pytorch_model.bin: 100%|███████████████████| 548M/548M [00:06<00:00, 84.8MB/s]
Downloading (…)neration_config.json: 100%|███████████████████| 124/124 [00:00<00:00, 1.17MB/s]
Downloading (…)olve/main/vocab.json: 100%|███████████████████| 1.04M/1.04M [00:00<00:00, 39.8MB/s]
Downloading (…)olve/main/merges.txt: 100%|███████████████████| 456k/456k [00:00<00:00, 54.7MB/s]
Downloading (…)/main/tokenizer.json: 100%|███████████████████| 1.36M/1.36M [00:00<00:00, 60.7MB/s]
/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Output: [{'generated_text': 'Hello, I\'m a language model, so you can\'t define something in any other language. Let me introduce another topic:\n\nThe name "'}, {'generated_text': "Hello, I'm a language model, you know.\n\nThat's right… I have a lot of friends who don't know what I do"}, {'generated_text': "Hello, I'm a language model, not a formal one. I'm more interested in languages than formal models and I'm going to use the formal"}, {'generated_text': "Hello, I'm a language model, which means that if you're a language designer, you need some understanding of the language model so you can build"}, {'generated_text': "Hello, I'm a language model, and now it's time to figure out where I want to focus my efforts.\n\nLet's imagine that"}]
Json Output:
[
{ "generated_text": "Hello, I'm a language model, so you can't define something in any other language. Let me introduce another topic:\n\nThe name \"" },
{ "generated_text": "Hello, I'm a language model, you know.\n\nThat's right… I have a lot of friends who don't know what I do" },
{ "generated_text": "Hello, I'm a language model, not a formal one. I'm more interested in languages than formal models and I'm going to use the formal" },
{ "generated_text": "Hello, I'm a language model, which means that if you're a language designer, you need some understanding of the language model so you can build" },
{ "generated_text": "Hello, I'm a language model, and now it's time to figure out where I want to focus my efforts.\n\nLet's imagine that" }
]
This is all well and good but each time we run the docker image it has to download the model. We can store the model in the docker image by adding the following to the Dockerfile
:
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
RUN pip install datasets transformers==4.28.0
RUN pip install datasets
RUN pip install torch
WORKDIR /app
COPY app.py .
# Download the model
RUN python app.py --download
CMD ["python", "app.py"]
Then we need to update the app.py
file to download the model:
import argparse
from transformers import GPT2LMHeadModel, GPT2Tokenizer
parser = argparse.ArgumentParser()
parser.add_argument('--download', action='store_true', help='Flag to download the model')
args = parser.parse_args()
if args.download:
model_name = 'gpt2'
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model.save_pretrained(model_name)
tokenizer.save_pretrained(model_name)
print(f"Model '{model_name}' downloaded successfully.")
exit(0)
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2', device=0)
set_seed(42)
results = generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
print("Output:", results)
Now when we run the docker image it will download the model and store it in the image. 🤗 In the next post, we will look at fine-tuning the model.
The full source code is available on GitHub.