On Tuesday, OpenAI announced a sizable update to its large language model API offerings (including GPT-4 and gpt-3.5-turbo), including a new function calling capability, significant cost reductions, and a 16,000 token context window option for the gpt-3.5-turbo model.
In large language models (LLMs), the “context window” is like a short-term memory that stores the contents of the prompt input or, in the case of a chatbot, the entire contents of the ongoing conversation. In language models, increasing context size has become a technological race, with Anthropic recently announcing a 75,000-token context window option for its Claude language model. In addition, OpenAI has developed a 32,000-token version of GPT-4, but it is not yet publicly available.
Along those lines, OpenAI just introduced a new 16,000 context window version of gpt-3.5-turbo, called, unsurprisingly, “gpt-3.5-turbo-16k,” which allows a prompt to be up to 16,000 tokens in length. With four times the context length of the standard 4,000 version, gpt-3.5-turbo-16k can process around 20 pages of text in a single request. This is a considerable boost for developers requiring the model to process and generate responses for larger chunks of text.