Complete Guide to Running LLM Locally with LM Studio

Source: DeepHub IMBA


This article is about 3,400 words long and recommends a 10-minute read.
LM Studio provides a convenient way to offer models using an OpenAI-compatible interface.

GPT-4 is widely regarded as the best generative AI chatbot, but open-source models have been improving and can surpass GPT-4 in certain specific areas through fine-tuning. In the open-source category,

for the following reasons, you may have considered running LLM locally on your computer:

Offline: No internet connection is required.
Model Access: Running models locally allows you to experiment with open-source models (Llama 2, Vicuna, Mistral, OpenOrca, etc.).
Privacy: When running models locally, no information is transmitted to the cloud. While privacy issues may be exaggerated when using cloud-based models like GPT-4, Bard, and Claude 2, running models locally can avoid any problems.
Experimentation: If you see the value of generative AI, you can learn about the details of the models through testing and discover what else is available.
Cost: Open-source models are free, and some can be used commercially without restrictions.

For many people, running a local LLM requires some computer knowledge, as it often involves running them in the command prompt or using more complex web tools like Oobabooga.

LM Studio is a free desktop software tool that makes it very easy to install and use open-source LLM models.

However, keep in mind that LM Studio is not open-source; it is just free to use.

But LM Studio is the best and easiest local testing tool I have seen so far, so I still recommend trying it for local testing.

First, go to “lmstudio.ai” and download the version suitable for your operating system:

LM Studio, select the LLM you want to install.

You can do this by selecting one of the community-suggested models listed in the main window or by using the search bar to find any model available on HuggingFace by keywords.

You can see the size of the installation/download files in the model search list. Make sure the downloaded size is not an issue. (You may need a VPN in China)

In the top left corner of the screen, the release date bar shows “compatibility guess”. LM Studio has checked the local system and displays the models it thinks can run on your computer. To see all models, click on “compatibility guess” (#1). Clicking on a model on the left will show the available versions on the right and indicate which models should work based on your computer specifications (#2). See the image below:

Depending on the computer’s capability/speed, larger models will be more accurate but slower. Most of the models in this shoe model are quantized and contain formats like GGML and GGUF. (You can refer to our previous articles for specifics on these formats)

Once the model download is complete, (1) select the model from the drop-down menu at the top of the window; (2) select the chat bubble in the left sidebar; (3) open the “Context Overflow Policy” and “Chat Appearance” on the right.

Make sure to select “Maintain a rolling window and truncate past messages” under “Context Overflow Policy” and select “Plaintext” under “Chat Appearance”.

Open “Model Configuration”, then open “Prompt Format”, scroll down to “Pre-prompt / System prompt”, and select the “>” symbol to open it. Here you can enter the system “role”. That is, you can set how you want the bot to behave and what “skills” or other specific qualities should be provided in its responses. This is similar to the “Custom instructions” in a ChatGPT Plus account.

Continue scrolling down to find “Hardware Settings”. The default setting is for the computer’s CPU to do all the work, but if a GPU is installed, it will be visible here. If the GPU memory is insufficient, you can specify how many layers (starting from 10-20) the GPU should process, which will offload some layers to the GPU. This is similar to the parameters in llama.cpp. You can also choose to increase the number of CPU threads used by the LLM. The default is 4. This also needs to be set according to your local computer.

After making these changes, you can use the local LLM. Just enter your query in the “USER” field, and the LLM will respond as “AI”.

LM Studio provides an excellent experience and serves as a great local alternative to ChatGPT. LM Studio offers a convenient way to provide models using an OpenAI-compatible interface, simplifying integration with clients that use OpenAI as a backend.

If you are looking for a quick and easy way to set up and use chat or server with different open-source models for personal use, LM Studio is a great starting point.

Author: Gene Bernardin

Editor: Huang Jiyan

Leave a Comment Cancel reply