Connect SCM to LM Studio (llama3 etc) and run offline AI models for FREE

1. Install LM Studio on your computer

Download and install LM Studio


2. Download model

Switch to power user UI

Click on discover

Find and install a model, eg Llama3, DeepSeek etc

If you are using a reasoning model that returns <think> tags, you should use DeepSeek inside SCM.

SCM will remove the think tag output for you automatically.


3. Load model

Click select a model

Click on model

Configure settings, eg allow GPU Offload to make it run quicker

Click load model

What model you can runs depends on your PC ram and graphics card.
Larger models require more ram.

Verify model is correctly loaded in UI


4. Start API server

We need to start the API server so SCM can access it.

Click on developer

Make sure status says ‘Running’

Click on the toggle if it is not running.

image


5. Fill out API details inside SCM

You must select either OpenAI Alt 1, 2 or DeepSeek (for reasoning models)

Find the completion URL of the model.

Check the developer logs for the full url.

You are looking for /chat/completions

eg: http://localhost:1234/v1/chat/completions

Copy and paste this into the URL field inside SCM.

Find the model name in LM studio.
eg: deepseek-r1-distill-qwen-7b

Copy and paste this into SCM.


6. Test

In SCM, select the correct AI service.
eg deepseek (or openAI alt)

Open the Ask AI chat box

Say hello!

Check LM studio for errors.


7. Troubleshooting errors

Chat returns error undefined retry…

Check LM studio log.

Select correct endpoint (must end in /chat/completions)

Verify endpoint is pasted into SCM correctly.

2 Likes

Thanks for sharing this. But the video seems to be private.

Set to public

Thanks for the heads up

1 Like

Hopefully youtube algo can help people discover SCM!

Invest in hardware for long term benefit, local AI and homelab is the future.

1 Like

Hey Tim, this is a great feature. Hope you send use emails for each feature updates. Love that.

1 Like

Just for ref:

I have RTX 4090 and inference speeds are lightning fast.
On just an AMD 5800x3d CPU inference speed was very slow, around 1-2 words a second.

1 Like

Desperately want hardware to run 70b q8 model fast.

anything rtx theoretically should be better than cpu.

The other is ram requirements.

Need more than 32gb ram, and to run in dual channel means going to 64gb.

Yeah, mobo first with dual GPU and 4 ram slot, should give great foundation for 5-10 years.

@Tim what is the speed of 4090?

From LM Studio
About 30 tokens a second
image

For fun mainly CPU AMD 5800x3d with low GPU usage
0.5 tokens a second
image

Basically without a GPU a large 70B model wasn’t usable

And What Processor you are using with it?

CPU AMD 5800x3d

For anyone that can’t run models on their PC,
Groq AI is a free online alternative.

How to signup to Groq AI for free unlimited llama3 70B (GPT4 competitor) calls

1 Like