Milvus Data Migration Tool – Milvusdm Overview

Milvusdm (Milvus Data Migration) is a data migration tool developed for Milvus, supporting the transfer of Milvus data as well as the import and export of data files:
  • Faiss to Milvus: Import uncompressed Faiss files into Milvus
  • HDF5 to Milvus: Import HDF5 formatted files into Milvus
  • Milvus to Milvus: Support data migration between Milvus instances
  • Milvus to HDF5: Batch backup Milvus data to local files in HDF5 format
Developers using milvusdm can improve data management efficiency and reduce operational costs.
Milvus Data Migration Tool - Milvusdm Overview

Features Introduction

In the previously published article Milvus Migration Upgrade Guide, we introduced how to migrate Milvus data from an online environment to an offline environment and between different versions of Milvus. The data migration tool milvusdm can help users intelligently migrate the required data by specifying collections or partitions in Milvus. Milvusdm is very easy to use; simply run the command pip3 install pymilvusdm to quickly install it. Additionally, you can access the open-source code of this project on GitHub. This article will introduce the usage of the milvusdm tool:
Faiss to Milvus

๐Ÿ“–Usage Example

1. Download the yaml file
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/F2M.yaml
2. Configure parameters
By specifying the file path data_path, read the data from Faiss and import the read vectors and ids into Milvus. During import, you need to specify the parameters dest_host, dest_port, mode, dest_collection_name, dest_partition_name, and collection_parameter.
F2M:    milvus_version: 0.10.5    data_path: '/home/data/faiss.index'    dest_host: '127.0.0.1'    dest_port: 19530    mode: 'append'    dest_collection_name: 'test'    dest_partition_name: ''    collection_parameter:      dimension: 256      index_file_size: 1024      metric_type: 'L2'
3. Run
$ milvusdm --yaml F2M.yaml

โ–ถ๏ธImplementation

Read the Faiss file, return feature vectors and corresponding ids, and import them into Milvus.
ids, vectors = faiss_data.read_faiss_data()insert_milvus.insert_data(vectors, self.dest_collection_name, self.collection_parameter, self.mode, ids, self.dest_partition_name)
HDF5 to Milvus
๐Ÿ“–Usage Example
1. Download the yaml file
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/H2M.yaml
2. Configure parameters
By specifying data_path or data_dir read the HDF5 formatted data and import it into Milvus. During import, you need to specify the parameters dest_host, dest_port, mode, dest_collection_name, dest_partition_name, and collection_parameter.

data_path parameter can specify multiple file paths, and data_dir parameter specifies the directory of the files. Only one of the two parameters can be configured.

H2M:    milvus-version: 0.10.5    data_path:      - /Users/zilliz/float_1.h5      - /Users/zilliz/float_2.h5    data_dir:    dest_host: '127.0.0.1'    dest_port: 19530    mode: 'overwrite'        # 'skip/append/overwrite'    dest_collection_name: 'test_float'    dest_partition_name: 'partition_1'    collection_parameter:        dimension: 128        index_file_size: 1024        metric_type: 'L2'
3. Run
$ milvusdm --yaml H2M.yaml

โ–ถ๏ธImplementation

Read HDF5 formatted files, return feature vectors and corresponding ids, and import them into Milvus.
embeddings, ids = self.file.read_hdf5_data()ids = insert_milvus.insert_data(embeddings, self.c_name, self.c_param, self.mode, ids,self.p_name)
Milvus to Milvus
๐Ÿ“–Usage Example
1. Download the yaml file
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2M.yaml
2. Configure parameters
By specifying source_milvus_path, mysql_parameter, and source_collection read the vector and ids data from the source Milvus and import it into Milvus. During import, you need to specify the parameters dest_host, dest_port, and mode.

If the source Milvus does not use MySQL for metadata management, the mysql_parameter parameter should be empty.

  M2M:    milvus_version: 0.10.5      source_milvus_path: '/home/user/milvus'      mysql_parameter:          host: '127.0.0.1'          user: 'root'          port: 3306          password: '123456'          database: 'milvus'      source_collection:        test:            - 'partition_1'            - 'partition_2'      dest_host: '127.0.0.1'      dest_port: 19530      mode: 'skip' # 'skip/append/overwrite'
3. Run
$ milvusdm --yaml M2M.yaml

โ–ถ๏ธImplementation

Read the meta information of the specified collection or partition, read the data files from local milvus/db based on the meta information, return feature vectors and corresponding ids, and import them into Milvus.
collection_parameter, _ = milvus_meta.get_collection_info(collection_name)r_vectors, r_ids, r_rows = milvusdb.read_milvus_file(self.milvus_meta, collection_name, partition_tag)milvus_insert.insert_data(r_vectors, collection_name, collection_parameter, self.mode, r_ids, partition_tag)
Milvus to HDF5
๐Ÿ“–Usage Example
1. Download the yaml file
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2H.yaml
2. Modify parameters
By specifying source_milvus_path, mysql_parameter, and source_collection read the data from the source Milvus, and save the vectors and ids in HDF5 format at the data_dir path.
M2H:    milvus_version: 0.10.5    source_milvus_path: '/home/user/milvus'    mysql_parameter:        host: '127.0.0.1'        user: 'root'        port: 3306        password: '123456'        database: 'milvus'    source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection.        test:            - 'partition_1'            - 'partition_2'      data_dir: '/home/user/data'
3. Run
$ milvusdm --yaml M2H.yaml

โ–ถ๏ธImplementation

Read the meta information of the specified collection or partition, read the data files from local milvus/db based on the meta information, return feature vectors and corresponding ids, and save them in a local HDF5 file.
collection_parameter, version = milvus_meta.get_collection_info(collection_name)r_vectors, r_ids, r_rows = milvusdb.read_milvus_file(self.milvus_meta, collection_name, partition_tag)data_save.save_yaml(collection_name, partition_tag, collection_parameter, version, save_hdf5_name)

Milvusdm Code Structure

We warmly welcome everyone to contribute code to the open-source project milvusdm. You can understand the design concept of the milvusdm tool through the code file structure. If there are new data migration requirements, you can also modify the source code to contribute to the community.
Milvus Data Migration Tool - Milvusdm Overview
When using milvusdm, it will execute the corresponding tasks based on the input yaml file, as shown in the figure below:
Milvus Data Migration Tool - Milvusdm Overview
  • pymilvusdm
    • Core
      • milvus_client.py, operations related to the Milvus client

      • read_data.py, read local HDF5 formatted data files (if there is a need to read other file formats, code can be added here)

      • read_faiss_data.py, read data files from Faiss

      • read_milvus_data.py, read data files from Milvus

      • read_milvus_meta.py, read Milvus meta information

      • data_to_milvus.py, based on yaml file configuration parameters, create collections or partitions, and import vectors and ids into Milvus

      • save_data.py, save the read data as HDF5 formatted files

      • write_logs.py, write debug/info/error logs during operations

    • faiss_to_milvus.py, implements importing Faiss file data into Milvus
    • hdf5_to_milvus.py, implements importing HDF5 formatted file data into Milvus
    • milvus_to_milvus.py, implements copying data from one Milvus to another
    • milvus_to_hdf5.py, implements exporting data from Milvus to HDF5 formatted files
    • main.py, executes related tasks based on the yaml file
    • setting.py, configuration parameters related to executing the code
  • setup.py, packages pymilvusdm and uploads it to PyPI
This article introduces the usage of the milvusdm tool and its open-source code. Milvusdm mainly supports the following four functions: Faiss to Milvus, HDF5 to Milvus, Milvus to Milvus, and Milvus to HDF5. If you have any questions or suggestions, feel free to raise issues or contribute code to this project. We plan to add the following features in the next version:
  • Support importing binary data files from Faiss into Milvus
  • Support specifying black and white lists during Milvus to Milvus
  • Support merging data from multiple collections or partitions into one collection during Milvus to Milvus
  • Support Milvus data backup and recovery

Welcome to Join the Milvus Community

github.com/milvus-io/milvus | Source Code
milvus.io | Official Website
milvusio.slack.com | Slack Community
zhihu.com/org/zilliz-11| Zhihu
zilliz.blog.csdn.net | CSDN Blog
space.bilibili.com/478166626 | Bilibili
Milvus Data Migration Tool - Milvusdm Overview

Leave a Comment