If you send {"role": "user", "content": "What is the most beautiful country?"} as the messages parameter, it's not only What is the most beautiful country? that is sent to the OpenAI API endpoint, but it seems like the whole "role": "user", "content": "What is the most beautiful country?".
I was able to confirm this using tiktoken.
If you run get_tokens_long_example.py you'll get the following output:
14
get_tokens_long_example.py
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
print(num_tokens_from_string("'role':'user','content':'What is the most beautiful country?'", "cl100k_base"))
If you run get_tokens_short_example.py you'll get the following output:
8
get_tokens_short_example.py
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
print(num_tokens_from_string("'role':'user','content':'.'", "cl100k_base"))
You said that the OpenAI API reports 15 tokens used in the first example and 9 tokens used in the second example. You probably noticed that I got 14 and 8 tokens using tiktoken (i.e., 1 token less in both examples). This seems to be a known tiktoken problem that should have been solved.
Anyway, I didn't dig that deep to figure out why I still get 1 token less, but I was able to prove that it's not only What is the most beautiful country? that is sent to the OpenAI API endpoint.
For more information about tiktoken, see this answer.