Tim
1
1. Install LM Studio on your computer
Download
Find and install a model eg Llama3
What model you can runs depends on your PC ram and graphics card
Running large models on your CPU can be quite slow so enable GPU Offload if possible
You need graphics card like NVIDIA that has CUDA support
2. Load modal and start API server
Load modal
Start API server
3. Fill out API details inside SCM
Select OpenAI Alt 1 or 2
Fill out Url and Model name
To find url and model
Inside LM studio, as highlighted below
4. Test
Use the AI chat box to send a quick prompt test
Check LM Studio, the server log will also show the request and reply
5. Use free AI model anywhere in SCM as usual
Setup complete!
2 Likes
Ryan
3
Thanks for sharing this. But the video seems to be private.
bukit
5
Hopefully youtube algo can help people discover SCM!
Invest in hardware for long term benefit, local AI and homelab is the future.
1 Like
SEO
6
Hey Tim, this is a great feature. Hope you send use emails for each feature updates. Love that.
1 Like
Tim
7
Just for ref:
I have RTX 4090 and inference speeds are lightning fast.
On just an AMD 5800x3d CPU inference speed was very slow, around 1-2 words a second.
1 Like
bukit
8
Desperately want hardware to run 70b q8 model fast.
Tim
9
anything rtx theoretically should be better than cpu.
The other is ram requirements.
Need more than 32gb ram, and to run in dual channel means going to 64gb.
bukit
10
Yeah, mobo first with dual GPU and 4 ram slot, should give great foundation for 5-10 years.
SEO
11
@Tim what is the speed of 4090?
Tim
12
From LM Studio
About 30 tokens a second
For fun mainly CPU AMD 5800x3d with low GPU usage
0.5 tokens a second
Basically without a GPU a large 70B model wasn’t usable
And What Processor you are using with it?
Tim
15
For anyone that can’t run models on their PC,
Groq AI is a free online alternative.
How to signup to Groq AI for free unlimited llama3 70B (GPT4 competitor) calls
1 Like