Deploying Open Source Large Models Locally with Ollama

ClickFollowWeChat Official Account, “Technical Insights” for Timely Updates!

Introduction

If you want to deploy and run an open-source large model on localhost, you can try Ollama. In this article, we will deploy Ollama and call the large model via API.

Installation

Ollama provides two development packages for Python and JavaScript, which are quite friendly for front-end developers, so use it!

pip install ollama

npm install ollama

Application Scenarios

  • Chat Interface

  • Multimodal

Models

We can check the list of models supported by Ollama on the library (ollama.com), which includes gemma, llama2, mistral, mixtral, and more, very rich.

For example, the open-source model we want to use is llama2, and we can use the following code to download (for the first time) and run the model:

# Pull the model
ollama pull llama2
# Run the model
ollama run llama2

API

If we have used some APIs from openai, we understand text completion, chat, embeddings, etc. ollama provides a REST API to offer request interfaces.

  • Generative API
curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'
  • Chat API
curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'
  • Embedding
curl http://localhost:11434/api/embeddings -d '{
  "model": "all-minilm",
  "prompt": "Here is an article about llamas..."
}'

Practical Application

We will combine Streamlit and Ollama to develop a chat application.

Streamlit[1] is a web development framework suitable for quickly completing UI development for large models and mathematical scientific calculations in Python.

We will also use Build a ChatGPT-like App | Streamlit code to quickly build a ChatGPT-like application.

# Import streamlit UI library
import streamlit as st
# Import ollama
import ollama
# Get the list of ollama models
model_list = ollama.list()
# Set default model name to llama2:7b-chat
if "model_name" not in st.session_state:
    st.session_state["model_name"] = "llama2:7b-chat"
# Initialize chat message array
if "messages" not in st.session_state:
    st.session_state.messages = []
# Set sidebar
with st.sidebar:
    # Sidebar title
    st.subheader("Settings")
    # Dropdown to select model, default selected llama2
    option = st.selectbox(
        'Select a model',
        [model['name'] for model in model_list['models']])
    st.write('You selected:', option)
    st.session_state["model_name"] = option
# Page title  Chat with llama
st.title(f"Chat with {st.session_state['model_name']}")
# Iterate through chat messages
for message in st.session_state.messages:
    # Based on role
    with st.chat_message(message["role"]):
        # Output content
        st.markdown(message["content"])

if prompt := st.chat_input("What is up?"):
    
    st.session_state.messages.append({"role": "user", "content": prompt})
    
    with st.chat_message("user"):
        st.markdown(prompt)
    
    with st.chat_message("assistant"):
        # Clear input box after large model returns
        message_placeholder = st.empty()
        full_response = ""
        for chunk in ollama.chat(
            model=st.session_state["model_name"],
            messages=[
                {"role": m["role"], "content": m["content"]}
                for m in st.session_state.messages
            ],
            # Print gradually
            stream=True,
        ):
            if 'message' in chunk and 'content' in chunk['message']:
                full_response += (chunk['message']['content'] or "")
                message_placeholder.markdown(full_response + "▌")
        message_placeholder.markdown(full_response)
    st.session_state.messages.append({"role": "assistant", "content": full_response})

Deploying Open Source Large Models Locally with Ollama
image.png
  • Pull the Model

ollama pull

Deploying Open Source Large Models Locally with Ollama
image.png

Besides llama2, we also pull orca-mini

  • List All Current Models

ollama list

Deploying Open Source Large Models Locally with Ollama
image.png
  • Run Streamlit

streamlit run app.py

Deploying Open Source Large Models Locally with Ollama
image.png

Conclusion

  • Ollama is truly convenient and reliable for deploying open-source large models locally. I ran it on an old Redmi machine, and it works.
  • Quickly built a web application using Streamlit.

References

  • Ollama Official Website
  • Ollama Python Development Package Example | Streamlit Chat Application Based on Local Deployment of Open-Source Large Models
  • ollama-libraries-example/python/app.py at main · sugarforever/ollama-libraries-example (github.com)

Reference

[1]

https://streamlit.io/: https://link.juejin.cn/?target=https%3A%2F%2Fstreamlit.io%2F

ClickFollowWeChat Official Account, “Technical Insights” for Timely Updates!

Leave a Comment