1. The problem
Ollama has no easy way to set context window for Local LLMs other than recreating the entire model with a new modelfile. So now we need a new LLM for each context size. So Silly.
2. Simple solution
Allow us to set Context Length along with Temperature when configuring an OpenAI Alt … I believe the param name is “num_ctx”
Tim
September 1, 2024, 12:40am
2
I had a search of num_ctx
Looks like there is a bug or missing feature set with ollama
I think num_ctx cant be used by openai api and only with direct calls to ollama using its own api. Ie you might have to use webhooks.
Otherwise is saw this.
ollama:main
← ollama:jmorganca/openai-context
opened 07:36PM - 25 Aug 24 UTC
Previously, `/v1/chat/completions` requests were limited to 2048 tokens. This PR… extends the context length by setting `num_ctx` to `max_tokens` if it's larger than the default context window of 2048 tokens. It also includes a minor clean up for the OpenAI compatibility unit tests.
Note: this doesn't solve the case of having a large context window while limiting the number of tokens to a small number. This will be solved in a future change where `num_ctx` will be set automatically based on available VRAM and compute.
Fixes https://github.com/ollama/ollama/issues/6286 https://github.com/ollama/ollama/issues/5356
There is open req to have openai max_token param map automatically to ollama num_ctx