Ollama: Local Large Model Running Guide

The author of this article is a front-end developer at 360 Qiwutuan.

Introduction to Ollama

Ollama is an open-source framework developed in Go that can run large models locally.

Official website: https://ollama.com/

GitHub repository: https://github.com/ollama/ollama

Installing Ollama

Download and Install Ollama

Choose the appropriate installation package based on your operating system type from the Ollama official website. Here, we select macOS for installation. Ollama: Local Large Model Running GuideAfter installation, type ollama in the terminal to see the supported commands for Ollama.

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

Check the Ollama version

ollama -v
ollama version is 0.1.31

Check the downloaded models

ollama list

NAME     ID           SIZE   MODIFIED    
gemma:2b b50d6c999e59 1.7 GB 3 hours ago

I already have a large model locally. Next, let’s see how to download a large model.

Download Large Model

Ollama: Local Large Model Running Guide
Download Model

After installation, it prompts to install the llama2 large model by default. Below are some models supported by Ollama.

Model Parameters Size Download
Llama 3 8B 4.7GB ollama run llama3
Llama 3 70B 40GB ollama run llama3:70b
Mistral 7B 4.1GB ollama run mistral
Dolphin Phi 2.7B 1.6GB ollama run dolphin-phi
Phi-2 2.7B 1.7GB ollama run phi
Neural Chat 7B 4.1GB ollama run neural-chat
Starling 7B 4.1GB ollama run starling-lm
Code Llama 7B 3.8GB ollama run codellama
Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
Llama 2 13B 13B 7.3GB ollama run llama2:13b
Llama 2 70B 70B 39GB ollama run llama2:70b
Orca Mini 3B 1.9GB ollama run orca-mini
LLaVA 7B 4.5GB ollama run llava
Gemma 2B 1.4GB ollama run gemma:2b
Gemma 7B 4.8GB ollama run gemma:7b
Solar 10.7B 6.1GB ollama run solar

Llama 3 is a large language model open-sourced by Meta on April 19, 2024, with two versions of 8 billion and 70 billion parameters, both supported by Ollama.

Here, we choose to install gemma 2b. Open the terminal and execute the command below:

ollama run gemma:2b
pulling manifest 
pulling c1864a5eb193... 100% ▕██████████████████████████████████████████████████████████▏ 1.7 GB                         
pulling 097a36493f71... 100% ▕██████████████████████████████████████████████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕██████████████████████████████████████████████████████████▏  136 B                         
pulling 22a838ceb7fb... 100% ▕██████████████████████████████████████████████████████████▏   84 B                         
pulling 887433b89a90... 100% ▕██████████████████████████████████████████████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 

After waiting for a while, it shows that the model download is complete.

The table above only shows some of the models supported by Ollama. More models can be found at https://ollama.com/library, including Chinese models like Alibaba’s Tongyi Qianwen.

Terminal Interaction

After the download is complete, you can directly interact in the terminal, for example, asking “Introduce React”

>>> Introduce React

The output is as follows:

Ollama: Local Large Model Running Guide

Show Help Command – /?

>>> /?
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load <model>   Load a session or model
  /save <model>   Save your current session
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts

Use """ to begin a multi-line message.

Show Model Info Command – /show

>>> /show
Available Commands:
  /show info         Show details for this model
  /show license      Show model license
  /show modelfile    Show Modelfile for this model
  /show parameters   Show parameters for this model
  /show system       Show system message
  /show template     Show prompt template

Show Model Details Command – /show info

>>> /show info
Model details:
Family              gemma
Parameter Size      3B
Quantization Level  Q4_0

API Calls

In addition to direct interaction in the terminal, Ollama can also be called via API. For example, executing ollama show --help shows that the local access address is: http://localhost:11434

ollama show --help
Show information for a model

Usage:
  ollama show MODEL [flags]

Flags:
  -h, --help         help for show
      --license      Show license of a model
      --modelfile    Show Modelfile of a model
      --parameters   Show parameters of a model
      --system       Show system message of a model
      --template     Show template of a model

Environment Variables:
      OLLAMA_HOST        The host:port or base URL of the Ollama server (e.g. http://localhost:11434)

Next, we will introduce two main APIs: generate and chat.

Generate

  • Streaming Response
curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt":"Introduce React, within 20 words"
}'
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.337192Z","response":"React","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.421481Z","response":" 是","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.503852Z","response":"一个","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.584813Z","response":"用于","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.672575Z","response":"构建","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.754663Z","response":"用户","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.837639Z","response":"界面","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:32.918767Z","response":"(","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.080361Z","response":"UI","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.160418Z","response":")","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.239247Z","response":"的","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.318396Z","response":" JavaScript","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.484203Z","response":" 库","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.671075Z","response":"。","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.751622Z","response":"它","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.833298Z","response":"允许","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:33.919385Z","response":"开发者","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.007706Z","response":"轻松","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.09201Z","response":"构建","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.174897Z","response":"可","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.414743Z","response":"重","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.497013Z","response":"用的","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.584026Z","response":" UI","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.669825Z","response":",","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.749524Z","response":"并","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.837544Z","response":"与","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:34.927049Z","response":"各种","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:35.008527Z","response":" JavaScript","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:35.088936Z","response":" 框架","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:35.176094Z","response":"一起","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:35.255251Z","response":"使用","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:35.34085Z","response":"。","done":false}
{"model":"gemma:2b","created_at":"2024-04-19T10:12:35.428575Z","response":"","done":true,"context":[106,1645,108,25661,18071,22469,235365,235284,235276,235960,179621,107,108,106,2516,108,22469,23437,5121,40163,81964,16464,57881,235538,5639,235536,235370,22978,185852,235362,236380,64032,227725,64727,81964,235553,235846,37694,13566,235365,236203,235971,34384,22978,235248,90141,19600,7060,235362,107,108],"total_duration":3172809302,"load_duration":983863,"prompt_eval_duration":80181000,"eval_count":34,"eval_duration":3090973000}
  • Non-Streaming Response

By setting the parameter “stream”: false, you can return the result in one go.

“bash
curl http://localhost:11434/api/generate -d ‘{
“model”: “gemma:2b”,
“prompt”:”Introduce React, within 20 words”,
“stream”: false
}’


```json
{
  "model": "gemma:2b",
  "created_at": "2024-04-19T08:53:14.534085Z",
  "response": "React is a large JavaScript library for building user interfaces, allowing you to easily create dynamic websites and applications.",
  "done": true,
  "context": [106, 1645, 108, 25661, 18071, 22469, 235365, 235284, 235276, 235960, 179621, 107, 108, 106, 2516, 108, 22469, 23437, 5121, 40163, 81964, 16464, 236074, 26546, 66240, 22978, 185852, 235365, 64032, 236552, 64727, 22957, 80376, 235370, 37188, 235581, 79826, 235362, 107, 108],
  "total_duration": 1864443127,
  "load_duration": 2426249,
  "prompt_eval_duration": 101635000,
  "eval_count": 23,
  "eval_duration": 1757523000
}
```

Chat

  • Streaming Response
curl http://localhost:11434/api/chat -d '{
  "model": "gemma:2b",
  "messages": [
    { "role": "user", "content": "Introduce React, within 20 words" }
  ]
}'

You can see the terminal output result:

{"model":"gemma:2b","created_at":"2024-04-19T08:45:54.86791Z","message":{"role":"assistant","content":"React"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:54.949168Z","message":{"role":"assistant","content":"是"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.034272Z","message":{"role":"assistant","content":"用于"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.119119Z","message":{"role":"assistant","content":"构建"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.201837Z","message":{"role":"assistant","content":"用户"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.286611Z","message":{"role":"assistant","content":"界面"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.37054Z","message":{"role":"assistant","content":" React"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.45099Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.534105Z","message":{"role":"assistant","content":"js"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.612744Z","message":{"role":"assistant","content":"框架"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.695129Z","message":{"role":"assistant","content":","},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.775357Z","message":{"role":"assistant","content":"允许"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.855803Z","message":{"role":"assistant","content":"开发者"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:55.936518Z","message":{"role":"assistant","content":"轻松"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:56.012203Z","message":{"role":"assistant","content":"地"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:56.178332Z","message":{"role":"assistant","content":"创建"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:56.255488Z","message":{"role":"assistant","content":"动态"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:56.336361Z","message":{"role":"assistant","content":"网页"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:56.415904Z","message":{"role":"assistant","content":"。"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-19T08:45:56.415904Z","message":{"role":"assistant","content":""},"done":true,"total_duration":2057551864,"load_duration":568391,"prompt_eval_count":11,"prompt_eval_duration":506238000,"eval_count":20,"eval_duration":1547724000}

The default is a streaming response, and you can also return it all at once by setting the parameter “stream”: false.

The difference between generate and chat is that generate produces data in one go, while chat can append historical records for multi-turn conversations.

Web UI

In addition to the terminal and API calling methods mentioned above, there are currently many open-source Web UIs available that can be set up locally to create a visual interface for interaction, such as:

  • open-webui

https://github.com/open-webui/open-webui

  • lollms-webui

https://github.com/ParisNeo/lollms-webui

The learning cost of running large models locally with Ollama is now very low. Everyone is encouraged to try deploying a large model locally 🎉🎉🎉

References

https://ollama.com/https://llama.meta.com/llama3/https://github.com/ollama/ollama/blob/main/docs/api.mdhttps://dev.to/wydoinn/run-llms-locally-using-ollama-open-source-gc0

END

About Qiwutuan

Qiwutuan is the largest front-end team at 360, representing the group in W3C and ECMAscript committee (TC39) work.Qiwutuan places a high emphasis on talent development, offering various career paths for engineers, lecturers, translators, business liaisons, team leaders, and more, along with corresponding technical, professional, general, and leadership training courses.Qiwutuan welcomes all kinds of outstanding talents to pay attention to and join the team with an open and inclusive attitude.

Ollama: Local Large Model Running Guide

Leave a Comment