Can you add Context Length to OpenAI Alt?

rcimedia · August 31, 2024, 3:28pm

1. The problem

Ollama has no easy way to set context window for Local LLMs other than recreating the entire model with a new modelfile. So now we need a new LLM for each context size. So Silly.

2. Simple solution

Allow us to set Context Length along with Temperature when configuring an OpenAI Alt … I believe the param name is “num_ctx”

Tim · September 1, 2024, 12:40am

I had a search of num_ctx

Looks like there is a bug or missing feature set with ollama

I think num_ctx cant be used by openai api and only with direct calls to ollama using its own api. Ie you might have to use webhooks.

Otherwise is saw this.

github.com/ollama/ollama

openai: increase context window when max_tokens is provided

ollama:main ← ollama:jmorganca/openai-context

opened 07:36PM - 25 Aug 24 UTC

jmorganca

+137 -120

Previously, `/v1/chat/completions` requests were limited to 2048 tokens. This PR… extends the context length by setting `num_ctx` to `max_tokens` if it's larger than the default context window of 2048 tokens. It also includes a minor clean up for the OpenAI compatibility unit tests. Note: this doesn't solve the case of having a large context window while limiting the number of tokens to a small number. This will be solved in a future change where `num_ctx` will be set automatically based on available VRAM and compute. Fixes https://github.com/ollama/ollama/issues/6286 https://github.com/ollama/ollama/issues/5356

There is open req to have openai max_token param map automatically to ollama num_ctx