Error Request too large for model `llama3-70b-8192`

jonhy_nguyen · December 30, 2024, 3:55am

Hello Tim,

I’m using Groq AI free to generate article with the template longform RAG Article top 5 Google articles.

But it says the message " {“status”:413,“data”:{“error”:{“message”:“Request too large for model llama3-70b-8192 in organization org_01jg9fs34zfe0vq0dm5aez9n0f on tokens per minute (TPM): Limit 6000, Requested 10695, please reduce your message size and try again. Visit GroqCloud for more information.”,“type”:“tokens”,“code”:“rate_limit_exceeded”}}} retry 3/3 in 32s"

Is there anyway we can fix this? I currently cannot generate anything due to the exceeded token per minute issue.

Tim · December 31, 2024, 9:33am

This is a limitation of the groq AI models, the input token length is to small to correctly allow you to give it 5 articles worth of context.

Solution 1

Use the headings of the article instead.
Load the RAG top 5 article headings.

Solution 2

Use just top 1 article, not the top 5

Solution 3

Switch to a model with a larger context window.

jonhy_nguyen · December 31, 2024, 10:01am

How do you think about the model “llama-3.1-8b-instant” ? I see that it has up to 20,000 token per minute so I use this one. But I’m not sure if it’s as good as the model llama3-70b-8192.

Please give me some advice

Tim · January 1, 2025, 8:57am

Its not the token per minute, its the model memory size.

eg llama3-70b-8192

The 8192 refers to the context window being limited to 8192 tokens, or maybe up to 4000 words per request.

Passing 5 articles worth of content probably eats into that token limit and leaves no room for the AI to response.

OpenAI has models with larger context windows.

eg gpt4o has 128k context window vs the 8K in llama.