Ollama: Deploying Open Source Large Models Locally

Click 01 Muggle Society Follow our public account, never get lost in AI learning

Ollama: Deploying Open Source Large Models Locally

  • Have you ever considered running open-source LLM locally?

  • Do you have to manually download large model files?

  • Are you struggling to build an API for your local model?

  • Have you tried managing multiple models locally?

I guess you have thought about it! These are all heavy physical tasks.

Fortunately, here comes Ollama.

Ollama is a popular large model management tool for deploying and running large language models locally – https://ollama.ai/

Ollama: Deploying Open Source Large Models Locally

Now, you no longer need to worry about these troubles; a single application can solve all these problems.

All you need to do is visit their official website, download the application, and install it. That’s it! Now, you will have a CLI or GUI for model management, including pulling, removing, running, creating custom models, and more.

What models does Ollama support?

Ollama supports a wide range of mainstream models. You can find the list of supported models at https://ollama.ai/library.

The current list is as follows:

  • llama2

  • mistral

  • llava

  • mixtral

  • starling-lm

  • neural-chat

  • codellama

  • dolphin-mixtral

  • mistral-openorca

  • llama2-unsensored

  • ……

Running Ollama

Once you have installed the application, you can start it by running the Ollama desktop application. Alternatively, you can also start it via the CLI:

$ ollama serve

Ollama CLI

Let’s take a look at Ollama’s command line interface.

Pulling Models

The command ollama pull <model name> will automatically download the model files for you.

01coder@X8EF4F3X1O ollama-libraries-example % ollama pull tinyllamapulling manifest pulling manifest                                                     MB/s      4s                                                               ▏ 284 MB/637 MB   28 MB/s     12spulling 2af3b81862c6... 100% ▕████▏ 637 MB                         pulling af0ddbdaaa26... 100% ▕████▏   70 B                         pulling c8472cd9daed... 100% ▕████▏   31 B                         pulling fa956ab37b8c... 100% ▕████▏   98 B                         pulling 6331358be52a... 100% ▕████▏  483 B                         verifying sha256 digest writing manifest removing any unused layers success 

Running Models

The command ollama run <model name> runs the model and starts an interactive dialogue with the specified model.

01coder@X8EF4F3X1O ollama-libraries-example % ollama run orca-mini&gt;&gt;&gt; Explain the word distinct Distinct means separate or distinct from others, with no similarity or connection to others. It refers to something that is unique or different in a way that cannot be easily identified or compared with other things. In other words, something that is distinct is not like anything else that you might encounter.
&gt;&gt;&gt; What did I ask? You asked me to explain the word "distinct".
&gt;&gt;&gt; Send a message (/? for help)

Querying Locally Deployed Models

The command ollama list lists all locally downloaded and deployed models.

01coder@X8EF4F3X1O ollama-libraries-example % ollama listNAME                    ID              SIZE    MODIFIED      llama2:7b-chat          fe938a131f40    3.8 GB  8 weeks ago  orca-mini:latest        2dbd9f439647    2.0 GB  25 hours ago phi:latest              e2fd6321a5fe    1.6 GB  29 hours ago tinyllama:latest        2644915ede35    637 MB  6 minutes ago

Ollama HTTP API

When you start Ollama, it provides a series of APIs for model management. Please refer to the following documentation for a complete list of endpoints:

https://github.com/ollama/ollama/blob/main/docs/api.md

Text Completion

Ollama listens by default on port 11434.

curl http://localhost:11434/api/generate -d '{  "model": "orca-mini",  "prompt":"Explain the word distinct"}'

Chat Completion

curl http://localhost:11434/api/chat -d '{  "model": "orca-mini",  "messages": [    { "role": "user", "content": "Explain the word distinct" }  ]}'

Python and Javascript Development Packages

Recently, Ollama released Python and JavaScript libraries that allow developers to integrate Ollama into existing or new applications with minimal effort.

  • Python: https://github.com/ollama/ollama-python

  • JavaScript: https://github.com/ollama/ollama-js

Python Library Use Case | Chat with LLM using Streamlit Application

Streamlit is a development framework well-suited for rapidly developing AI applications.

I created a simple project demonstrating how to build a web application using the Ollama Python library to chat with any model supported by Ollama. Please check it out at:

https://github.com/sugarforever/ollama-libraries-example/tree/main/python

This is based on a Streamlit tutorial created using the OpenAI Python SDK:

https://docs.streamlit.io/knowledge-base/tutorials/build-conversational-apps

The code is straightforward, and here are some key points:

Getting the Model List

import ollama
model_list = ollama.list()

Chatting with the Model

Compared to the OpenAI Python SDK, the chat function of the Ollama Python library returns data in a different format. In streaming mode, you should parse the chunks as follows:

        full_response = ""
        for chunk in ollama.chat(
            model=st.session_state["model_name"],
            messages=[
                {"role": m["role"], "content": m["content"]}
                for m in st.session_state.messages
            ],
            stream=True,
        ):
            if 'message' in chunk and 'content' in chunk['message']:
                full_response += (chunk['message']['content'] or "")
                message_placeholder.markdown(full_response + "▌")

With this example application, you should see an interface similar to the following.

Ollama: Deploying Open Source Large Models Locally

Alright, that’s it for today’s sharing. Have a great weekend!🎿⛷🏂🪂🏋️♀️

📝 Recommended Reading

[Preparing for Gemini] Introduction to Google Vertex AI API and LangChain Integration

Google releases the strongest model Gemini – Bard integrates Gemini Pro, API integration interface released on December 13

[Privacy First] Llama 2 + GPT4All + Chroma achieves 100% localization RAG

[Spark API Gateway] iFlytek Starfire Model seamlessly replaces OpenAI GPT4-Vision

No coding required! Create OpenAI Assistants applications – [Zero-code platform Flowise practical]

[Stylish RAG] 03 Multi-document-based agent

[Stylish RAG] 02 Multimodal RAG

[Stylish RAG] 01 RAG on semi-structured data

Leave a Comment