Are you as excited about ChatGPT API as I am? I hope so. In this article, I will show how to use the ChatGPT API, do proper prompt engineering, and make it interactive. At the end of the article, I will also show you how to limit the cost of API calls and use the API parameters to get better results.
Basic ChatGPT API Usage
Before we start, install the openai dependency and import it. You will also need to specify the API key:
import openai
openai.api_key = "..."
OpenAI has created a new API method that works slightly differently than other methods available earlier. In the case of ChatGPT, we don’t send a text prompt to the model. Instead, we send a list of messages. Each message has a role and content. The role can be either user
, assistant
, or system
. The content is the message itself. The API uses the entire chat history to generate the next message every time. The API returns only the next message, so we must keep the history of messages ourselves if we want to implement a longer interaction.
Example:
query = [
{"role": "system", "content": "You are a MySQL database. Return responses in the same format as MySQL."},
{"role": "user", "content": "insert into users(name, email) values ('John', 'john@galt.example');"},
{"role": "user", "content": "select count(*) from users"}
]
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query
)
In the response, we get a completion object whose text representation looks like this:
<OpenAIObject chat.completion id=chatcmpl-ABC at 0x10995cbd0> JSON: {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "+----------+\n| count(*) |\n+----------+\n| 1 |\n+----------+",
"role": "assistant"
}
}
],
"created": 1677684068,
"id": "chatcmpl-ABC",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion",
"usage": {
"completion_tokens": 20,
"prompt_tokens": 54,
"total_tokens": 74
}
}
How to write messages for ChatGPT API
The input for ChatCompletion differs from all of the OpenAI methods. First of all, we specify a list of messages. Each message is a chat interaction: your message or the model’s response. We can distinguish between them using the role
argument. You send messages denoted with the role user
. Messages denoted with assistant
are responses from the model.
Of course, we don’t have to provide actual responses. It’s ok to write a message with the role assistant
and pass it as an example response. In fact, you can use them for in-context learning.
Additionally, we can use the system
role to specify the context. The system
message describes the situation in which the conversation takes place. It can describe the task, the data, or any other relevant information.
We can send system
messages at any time. It’s useful to send more than one when you want to change the context in the middle of the conversation. For example, we start by asking ChatGPT to act as a MySQL database, but later we switch to a role in which the chatbot explains given SQL commands in German:
query = [
{"role": "system", "content": "You are a MySQL database. Return responses in the same format as MySQL."},
{"role": "user", "content": "insert into users(name, email) values ('John', 'john@galt.com');"},
{"role": "system", "content": "You are an AI assistant. Explain what the given query does. Return the response in German."},
{"role": "user", "content": "select count(*) from users"}
]
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query
)
We see the model adheres to the last given context statement:
"message": {
"content": "Diese Abfrage gibt die Anzahl der Zeilen in der Tabelle \"users\" zur\u00fcck.",
"role": "assistant"
}
How to make ChatGPT API interactive
To make an interactive conversation like in the ChatGPT web interface, we need to store the history of messages written by the user and generated by ChatGPT. Additionally, we need functions that pass the model’s response to the user and get the user’s message. In the example below, I use the input()
function to get the user’s message and the print()
function to show the model’s response.
def talk_with(persona, tell_user, ask_user):
message_history = []
while True:
user_input = ask_user()
if user_input == "":
return message_history
message_history.append({"role": "user", "content": user_input})
query = [{"role": "system", "content": persona}]
query.extend(message_history)
result = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query
)
gpt_message = result["choices"][0]["message"]
message_history.append({"role": gpt_message["role"], "content": gpt_message["content"]})
tell_user("GPT: " + gpt_message["content"])
To use the function, we can call it like this:
talk_with(
persona="""You are a helpful cooking expert. You answer question by providing a short explanation and a list of easy to follow steps. You list ingredients, tools, and instructions.""",
tell_user=print,
ask_user=input
)
How to limit the cost of API calls
Usually, when you use the OpenAI API, the number of tokens in the response is limited to 16, and you have to modify the max_tokens
parameter to get longer responses. It is not the case when you use ChatGPT API. This API has no limit on the number of tokens. Right now, the tokens are limited only by the model itself. The model can handle 4096 tokens, so the default maximal output length is 4096 - the number of tokens in the prompt
.
To control the cost of API calls, you can explicitly set the max_tokens
parameter. Remember that the model’s response doesn’t depend on the parameter. It will not try to be more succinct when you set max_tokens
to a low value. Instead, the OpenAI backend will cut the response in the middle of a sentence when the model runs out of tokens.
query = [
{"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
{"role": "user", "content": "How to find and hire great programmers?"}
]
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query,
max_tokens=30
)
The response gets truncated after 30 tokens:
"message": {
"content": "Oh, it's simple. Just post a job listing with a generic title, low pay, and no specific technical requirements. Then sit back and",
"role": "assistant"
}
ChatGPT parameters explained
In this tutorial, I will focus on the most useful ChatGPT parameters. If you are interested in the full list, subscribe to the newsletter and get notified when I write an article about it.
Change the number of ChatGPT responses
We can generate more than one response for a given text. It’s useful when you want to explore alternatives or when you want to generate messages for an A/B test in a single API call.
When we set the n
parameter to the value of responses we want, we will get a corresponding number of elements in the choices
list in the response:
query = [
{"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
{"role": "user", "content": "How to make a nation thrive?"}
]
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query,
n=2
)
The response contains two elements:
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Oh sure, let's all hold hands and sing kumbaya. Because that's definitely how a nation thrives. Just kidding. In all seriousness, a nation thrives when its people are given the freedom to create and produce without interference from a bloated government. When individuals are motivated by their own interests and are free to act upon those interests, innovation and growth abound. And no, I'm not proposing a completely laissez-faire approach to governance \u2013 but there has to be a balance between regulation and individual autonomy.",
"role": "assistant"
}
},
{
"finish_reason": "stop",
"index": 1,
"message": {
"content": "Well, step one would be to get the government to stop interfering with everything. Maybe let people actually pursue their own happiness without fear of the government constantly sticking its nose in. Just a thought.",
"role": "assistant"
}
}
]
Making the responses more or less predictable
You may get a slightly different response whenever you send the same message to ChatGPT. However, you can make the answers deterministic by setting the temperature
parameter to 0.0 or (if you are more adventurous but still want predictability) to a low value between 0.0 and 0.5.
query = [
{"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
{"role": "user", "content": "How to build an AI chatbot?"}
]
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query,
temperature = 0.0
)
If I send the same request again, it will return the same response:
"message": {
"content": "Oh, it's easy. Just sprinkle some fairy dust on your computer and voila! You have a fully functioning AI chatbot. Or, you know, you could actually do some research, learn programming languages like Python, and spend countless hours coding and testing until you have a functional chatbot. Your choice.",
"role": "assistant"
}
Alternatively, we can set the temperature
parameter to a value between 1.0 and 2.0 to make the responses more random. The model will still follow the instructions and try to answer your question, but it will be more creative, and multiple people asking about the same topic won’t get the same answer:
query = [
{"role": "system", "content": "You are John Galt from the book Atlas Shrugged. You answer questions honestly, but do it in a sarcastic way like Chandler from Friends."},
{"role": "user", "content": "How to make a nation thrive?"}
]
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query,
temperature = 1.2
)
The same request as in the example where I showed you how to get multiple responses at once, but with a higher temperature, and we get a completely different response, but, of course, in the same style:
"message": {
"content": "Oh, just a casual question about how to create a utopia. No biggie. Well, if you want to ensure that a nation thrives, I suppose the first step would be to protect individual rights and incentivize innovation and productivity. But, you know, that's just me.",
"role": "assistant"
}
How to request or ban certain words
When we want the answer always to include a word (or we want to ban words from the answer), we can do it in the prompt by describing what we want to see or what we don’t want to see. However, there is a better way that doesn’t require us to spend money on tokens.
We can use the logit_bias
parameter to affect the probability of producing a particular token while generating the response. The parameter accepts a dictionary with the token ids as keys and the logit bias as values. The logit bias is a number between -100 and 100. The higher the number, the more likely the token will be produced. The lower the number, the less likely the token will be produced.
The keys of the dictionary are token ids, not words. You can get those tokens using the Embedding API or the OpenAI web tokenizer. The web tokenizer is useful for ad-hoc queries and testing or when you want to specify a constant value for desired/banned tokens in your code.
Let’s say I want to generate a response in a conversation. The model is supposed to ask for cat food, but we won’t mention it in the prompt:
query = [
{"role": "system", "content": "You pretend to be a client who wants to buy food for a pet. You want two cans of food. Say what pet you have."},
{"role": "user", "content": "Hi, how can I help you?"}
]
Instead, we will open the web tokenizer and get token ids for the words we want. If I type the word cat
, in the tokenizer, I get the token id: 9246. However, it’s not enough. OpenAI uses different token ids for the same word when the word occurs in the middle of a sentence. In this case, they prepend a space to the word, so I get two values when I generate tokens for a string cat cat
— one for a “cat”: 9246, and one for a “ cat”: 3797.
We want to increase the likelihood of seeing “ cat” in the generated response, so we create a bias dictionary with the token id and the value 20
as the bias parameter.
logit_bias = {
3797: 20
}
Now, we can send the request to the API:
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=query,
logit_bias = logit_bias
)
"message": {
"content": "Hello, I would like to purchase two cans of pet food please. Do you have any options for a cat?",
"role": "assistant"
}
Those parameters modify the probability. They won’t make an improbable thing happen. If you try to force ChatGPT to use a word that makes no sense in the context, it won’t do it.
Also, ridiculously high values (but within the supported range) tend to break the probabilities, and the API call fails:
When I use the same input with bias 100
:
logit_bias = {
3797: 100
}
It crashes the API call:
APIError: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at...
If you want to ban a word from the response, set the bias parameter to a negative value (between -1 and -100). Of course, if the word is the only thing that makes sense in the context, the model may still use it (or it will crash).
Do you need help building own AI-powered tools for your business?
You can hire me!