SEO Content Machine Forum

Connect SCM to LM Studio (llama3 etc) and run offline AI models for FREE

questions workflow

Tim May 20, 2024, 11:49am 1

1. Install LM Studio on your computer

Download and install LM Studio

2. Download model

Switch to power user UI

Click on discover

Find and install a model, eg Llama3, DeepSeek etc

If you are using a reasoning model that returns <think> tags, you should use DeepSeek inside SCM.

SCM will remove the think tag output for you automatically.

3. Load model

Click select a model

Click on model

Configure settings, eg allow GPU Offload to make it run quicker

Click load model

What model you can runs depends on your PC ram and graphics card.
Larger models require more ram.

Verify model is correctly loaded in UI

4. Start API server

We need to start the API server so SCM can access it.

Click on developer

Make sure status says ‘Running’

Click on the toggle if it is not running.

5. Fill out API details inside SCM

You must select either OpenAI Alt 1, 2 or DeepSeek (for reasoning models)

Find the completion URL of the model.

Check the developer logs for the full url.

You are looking for /chat/completions

eg: http://localhost:1234/v1/chat/completions

Copy and paste this into the URL field inside SCM.

Find the model name in LM studio.
eg: deepseek-r1-distill-qwen-7b

Copy and paste this into SCM.

6. Test

In SCM, select the correct AI service.
eg deepseek (or openAI alt)

Open the Ask AI chat box

Say hello!

Check LM studio for errors.

7. Troubleshooting errors

Chat returns error undefined retry…

Check LM studio log.

Select correct endpoint (must end in /chat/completions)

Verify endpoint is pasted into SCM correctly.

2 Likes

How to get free unlimited AI content like GTP3.5 using LM Studio + Llama 3

Llama3 api

How to use Llama 4 for free using LM Studio

Tim May 20, 2024, 11:57am 2

Ryan May 20, 2024, 1:53pm 3

Thanks for sharing this. But the video seems to be private.

Tim May 20, 2024, 2:29pm 4

Set to public

Thanks for the heads up

1 Like

bukit May 20, 2024, 2:43pm 5

Hopefully youtube algo can help people discover SCM!

Invest in hardware for long term benefit, local AI and homelab is the future.

1 Like

SEO May 20, 2024, 2:45pm 6

Hey Tim, this is a great feature. Hope you send use emails for each feature updates. Love that.

1 Like

Tim May 20, 2024, 3:03pm 7

Just for ref:

I have RTX 4090 and inference speeds are lightning fast.
On just an AMD 5800x3d CPU inference speed was very slow, around 1-2 words a second.

1 Like

bukit May 20, 2024, 3:09pm 8

Desperately want hardware to run 70b q8 model fast.

Tim May 20, 2024, 3:23pm 9

anything rtx theoretically should be better than cpu.

The other is ram requirements.

Need more than 32gb ram, and to run in dual channel means going to 64gb.

bukit May 20, 2024, 3:26pm 10

Yeah, mobo first with dual GPU and 4 ram slot, should give great foundation for 5-10 years.

SEO May 20, 2024, 4:44pm 11

@Tim what is the speed of 4090?

Tim May 20, 2024, 9:30pm 12

From LM Studio
About 30 tokens a second

For fun mainly CPU AMD 5800x3d with low GPU usage
0.5 tokens a second

Basically without a GPU a large 70B model wasn’t usable

Devebder May 23, 2024, 5:17pm 13

And What Processor you are using with it?

Tim May 23, 2024, 7:29pm 14

CPU AMD 5800x3d

Tim May 27, 2024, 1:46pm 15

For anyone that can’t run models on their PC,
Groq AI is a free online alternative.

How to signup to Groq AI for free unlimited llama3 70B (GPT4 competitor) calls

1 Like

Tim Split this topic July 16, 2025, 1:16pm 16

7 posts were split to a new topic: Unable to connect to LM Studio