Embedding Models in LlamaIndex

You may have heard of the concept of word embedding, which represents semantics using numerical vectors. The closer the numerical vectors are, the more similar the corresponding statements or words are in meaning.

LlamaIndex also uses embeddings to represent documents. The embedding model takes text as input and returns a long string of numbers that capture the semantics of the text. These embedding models are trained on a large corpus of text.

At a high level, if a user asks a question about dogs, the embedding of that question will be highly similar to the text discussing dogs.

There are many methods to calculate the similarity between embeddings (dot product, cosine similarity, etc.). By default, LlamaIndex uses cosine similarity when comparing embeddings.

Currently, there are many embedding models available. By default, LlamaIndex uses OpenAI’s text-embedding-ada-002.

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
# global
Settings.embed_model = OpenAIEmbedding()
# per-index
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

However, to save on token consumption or due to other uncontrollable factors, we can also use open-source embedding models available on Hugging Face, such as BGE.

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(    model_name="BAAI/bge-large-zh-v1.5")

This will first download the model BAAI/bge-large-zh-v1.5 from Hugging Face, but we can also download it in advance:

embed_model = HuggingFaceEmbedding(    model_name="../models/BAAI/bge-large-zh-v1.5",)

This way, we can use a local embedding model.

The most common use of embedding models is to set them in the global Settings object and then use them to construct indexes and queries. The input documents will be broken down into several nodes, and the embedding model will generate an embedding for each node.

By default, LlamaIndex will use text-embedding-ada-002, but we can manually change it:

from llama_index.core import Settings
Settings.embed_model = embed_model

LlamaIndex also supports creating and using ONNX embeddings with Hugging Face’s Optimum library.

ONNX (Open Neural Network Exchange) is an open file format designed for machine learning to store trained models. It allows different AI frameworks to store model data in the same format and interact with each other. The specifications and code for ONNX are mainly developed by companies such as Microsoft, Amazon, Facebook, and IBM, and are hosted as open source on GitHub.

First, install the necessary dependencies:

pip install transformers optimum[exporters]
pip install llama-index-embeddings-huggingface-optimum

Then create and save the ONNX embedding:

from llama_index.embeddings.huggingface_optimum import OptimumEmbedding
OptimumEmbedding.create_and_save_optimum_model(    "BAAI/bge-large-zh-v1.5", "./bge_onnx")

Finally, we can use it:

embed_model = OptimumEmbedding(folder_name="./bge_onnx")

However, it seems that Windows 11 is currently not supported.

UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.

Leave a Comment