Milvus Query Merge Mechanism

Milvus Data Segment Merge Mechanism

If you missed the previous article, you can click the title to review.

| Query Request Queue

The connection layer of Milvus uses gRPC to provide RPC services externally, and the oatpp framework to provide RESTful services. The maximum number of connections in the server’s gRPC connection pool is set to 20, and multiple query requests from clients are received asynchronously. However, since each query request requires a large amount of computational resources, multiple queries executing simultaneously will compete for resources. Therefore, the connection layer places query requests into a queue, allowing the background query scheduler (Query Scheduler) to take query requests from the end of the queue and execute them one by one.

| Query Merging

To improve QPS (Query Per Second), starting from version 0.8.0, Milvus attempts to merge query requests upon receiving them.

The main basis for improving query efficiency through merging queries is: for queries with a small nq (number of target vectors), the CPU/GPU parallelism is low, and computational resources are partially idle; if multiple queries’ target vectors are combined for computation, it can enhance the utilization of computational resources.

Before client requests enter the queue, an additional request scheduling step has been added to preprocess requests based on different strategies.

The preprocessing of query requests involves: first checking if there are still unprocessed query requests in the queue; if there are, the previously queued query request is compared with the new query request; if the merging conditions are met, the two requests are merged into one and placed in the queue, while the previous query request is removed from the queue:

Merging query requests allows for multiple merges, and the specific number of requests that can be merged is determined by the runtime state of Milvus. Multiple queries must meet the following conditions for merging:

The query target is the same collection and queries within the same partition
The topk parameter difference does not exceed 200
The number of target vectors for merging does not exceed 200
Other index-related query parameters must be the same, such as nprobe

Here is a set of examples:

If you understand the principles of vector search, it is not difficult to understand the reasons for setting these merging conditions:

The same collection and the same partition limit the search scope, allowing multiple queries to avoid interference only within the same range.
nq less than 200 ensures that the computation time is not too long, preventing individual requests from waiting too long.
The topk difference of less than 200 is for the convenience of processing the result set.
Index-related query parameters must be the same, so that the same process can be followed in the internal ANNS library calculation.

| Merging Queries Improves Query Efficiency

Next, we will test the effect of merging queries using pymilvus.

Hardware Environment	Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 12 Cores
Milvus Version	0.9.1 GPU version
Test Dataset	10 million 128-dimensional randomly generated vectors
Index	IVFSQ8, nlist is 2048
Query Parameters	Execute 1000 queries, nq is 1, topk is 10, nprobe is 16

Client single-threaded execution script for 1000 queries:

import time
import threading
import numpy as np
from milvus import Milvus, IndexType
from milvus.client.types import MetricType
SERVER_ADDR = "127.0.0.1"
SERVER_PORT = '19530'
COLLECTION_DIMENSION = 128
COLLECTION_NAME = "TEST"
INDEX_TYPE = IndexType.IVF_SQ8
INDEX_PARAM = {'nlist': 2048}
SEARCH_PARAM = {'nprobe': 16}
TOPK = 10
MILVUS = Milvus(host=SERVER_ADDR, port=SERVER_PORT)
def gen_vec_list(nb, seed=np.random.RandomState(1234)):
    xb = seed.rand(nb, COLLECTION_DIMENSION).astype("float32")
    vec_list = xb.tolist()
    return vec_list
def search(vec_list):
    status, result = MILVUS.search(collection_name=COLLECTION_NAME, top_k=TOPK,
                                   query_records=vec_list, params=SEARCH_PARAM)
def multi_search():
    time_start = time.time()
    SEARCH_COUNT = 1000
    vec_list = gen_vec_list(1)
    for k in range(SEARCH_COUNT):
        search(vec_list=vec_list)
    time_end = time.time()
    total_cost = time_end - time_start
    print("search total cost", total_cost, 'sec')
    print('QPS = ', SEARCH_COUNT/total_cost)
if __name__ == "__main__":
    multi_search()

Execute the script 3 times and take the average:

Total time for 1000 queries: 7.18 seconds
QPS: 139.24

Client multi-threaded execution script for 1000 queries:

import time
import threading
import numpy as np
from milvus import Milvus, IndexType
from milvus.client.types import MetricType
SERVER_ADDR = "127.0.0.1"
SERVER_PORT = '19530'
COLLECTION_DIMENSION = 128
COLLECTION_NAME = "TEST"
INDEX_TYPE = IndexType.IVF_SQ8
INDEX_PARAM = {'nlist': 2048}
SEARCH_PARAM = {'nprobe': 16}
TOPK = 10
MILVUS = Milvus(host=SERVER_ADDR, port=SERVER_PORT)
def gen_vec_list(nb, seed=np.random.RandomState(1234)):
    xb = seed.rand(nb, COLLECTION_DIMENSION).astype("float32")
    vec_list = xb.tolist()
    return vec_list
def search(vec_list):
    status, result = MILVUS.search(collection_name=COLLECTION_NAME, top_k=TOPK,
                                   query_records=vec_list, params=SEARCH_PARAM)
def multi_search():
    time_start = time.time()
    SEARCH_COUNT = 1000
    threads = []
    vec_list = gen_vec_list(1)
    for k in range(SEARCH_COUNT):
        x = threading.Thread(target=search, args=(vec_list,))
        threads.append(x)
        x.start()
    for th in threads:
        th.join()
    time_end = time.time()
    total_cost = time_end - time_start
    print("search total cost", total_cost, 'sec')
    print('QPS = ', SEARCH_COUNT/total_cost)
if __name__ == "__main__":
    multi_search()

Execute the script 3 times and take the average:

Total time for 1000 queries:4.93 seconds
QPS:202.79

| Welcome to join the Milvus community

github.com/milvus-io/milvus | Source Code

milvus.io | Official Website

milvusio.slack.com | Slack Community

zhihu.com/org/zilliz-11/columns | Zhihu

zilliz.blog.csdn.net | CSDN Blog

space.bilibili.com/478166626 | Bilibili

Leave a Comment Cancel reply