ClickFollowWeChat Official Account, “Technical Insights” for Timely Updates!

Introduction

If you want to deploy and run an open-source large model on localhost, you can try Ollama. In this article, we will deploy Ollama and call the large model via API.

Installation

Ollama provides two development packages for Python and JavaScript, which are quite friendly for front-end developers, so use it!

pip install ollama

npm install ollama

Application Scenarios

Chat Interface
Multimodal

Models

We can check the list of models supported by Ollama on the library (ollama.com), which includes gemma, llama2, mistral, mixtral, and more, very rich.

For example, the open-source model we want to use is llama2, and we can use the following code to download (for the first time) and run the model:

# Pull the model
ollama pull llama2
# Run the model
ollama run llama2

API

If we have used some APIs from openai, we understand text completion, chat, embeddings, etc. ollama provides a REST API to offer request interfaces.

Generative API

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'

Chat API

curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Embedding

curl http://localhost:11434/api/embeddings -d '{
  "model": "all-minilm",
  "prompt": "Here is an article about llamas..."
}'

Practical Application

We will combine Streamlit and Ollama to develop a chat application.

Streamlit^[1] is a web development framework suitable for quickly completing UI development for large models and mathematical scientific calculations in Python.

We will also use Build a ChatGPT-like App | Streamlit code to quickly build a ChatGPT-like application.

# Import streamlit UI library
import streamlit as st
# Import ollama
import ollama
# Get the list of ollama models
model_list = ollama.list()
# Set default model name to llama2:7b-chat
if "model_name" not in st.session_state:
    st.session_state["model_name"] = "llama2:7b-chat"
# Initialize chat message array
if "messages" not in st.session_state:
    st.session_state.messages = []
# Set sidebar
with st.sidebar:
    # Sidebar title
    st.subheader("Settings")
    # Dropdown to select model, default selected llama2
    option = st.selectbox(
        'Select a model',
        [model['name'] for model in model_list['models']])
    st.write('You selected:', option)
    st.session_state["model_name"] = option
# Page title  Chat with llama
st.title(f"Chat with {st.session_state['model_name']}")
# Iterate through chat messages
for message in st.session_state.messages:
    # Based on role
    with st.chat_message(message["role"]):
        # Output content
        st.markdown(message["content"])

if prompt := st.chat_input("What is up?"):
    
    st.session_state.messages.append({"role": "user", "content": prompt})
    
    with st.chat_message("user"):
        st.markdown(prompt)
    
    with st.chat_message("assistant"):
        # Clear input box after large model returns
        message_placeholder = st.empty()
        full_response = ""
        for chunk in ollama.chat(
            model=st.session_state["model_name"],
            messages=[
                {"role": m["role"], "content": m["content"]}
                for m in st.session_state.messages
            ],
            # Print gradually
            stream=True,
        ):
            if 'message' in chunk and 'content' in chunk['message']:
                full_response += (chunk['message']['content'] or "")
                message_placeholder.markdown(full_response + "▌")
        message_placeholder.markdown(full_response)
    st.session_state.messages.append({"role": "assistant", "content": full_response})

Deploying Open Source Large Models Locally with Ollama — image.png

Pull the Model

ollama pull

Besides llama2, we also pull orca-mini

List All Current Models

ollama list

Run Streamlit

streamlit run app.py

Conclusion

Ollama is truly convenient and reliable for deploying open-source large models locally. I ran it on an old Redmi machine, and it works.
Quickly built a web application using Streamlit.

References

Ollama Official Website
Ollama Python Development Package Example | Streamlit Chat Application Based on Local Deployment of Open-Source Large Models
ollama-libraries-example/python/app.py at main · sugarforever/ollama-libraries-example (github.com)

Reference

[1]

https://streamlit.io/: https://link.juejin.cn/?target=https%3A%2F%2Fstreamlit.io%2F

ClickFollowWeChat Official Account, “Technical Insights” for Timely Updates!

ClickFollowWeChat Official Account, “Technical Insights” for Timely Updates!

Introduction

Installation

Application Scenarios

Models

API

Practical Application

Conclusion

References

Reference

ClickFollowWeChat Official Account, “Technical Insights” for Timely Updates!

Leave a Comment Cancel reply