High-Speed Download of HuggingFace Models in China

High-Speed Download of HuggingFace Models in China

Author: Apathy
Link: https://zhuanlan.zhihu.com/p/669120427

Note: This article has been tested and is effective, highly recommended.

Users in China can use the official HuggingFace download tool huggingface-cli and hf_transfer to download models and datasets from the HuggingFace mirror site at high speed.

HuggingFace-Download-Acceleratorgithub.com/LetheSec/HuggingFace-Download-Accelerator

Quick Start

1. Clone the project to your local machine:

git clone https://github.com/LetheSec/HuggingFace-Download-Accelerator.git
cd HuggingFace-Download-Accelerator

2. Get the desired model or dataset name from HuggingFace, for example lmsys/vicuna-7b-v1.5, and run the script to download:

python hf_download.py --model lmsys/vicuna-7b-v1.5 --save_dir ./hf_hub
  • By default, hf_transfer is used. If you wish to disable it, you can specify --use_hf_transfer False.

  • The downloaded files will be stored in the specified save_dir, which will be ./hf_hub/models--lmsys--vicuna-7b-v1.5

3. When loading the downloaded model using the transformers library, specify the saved path:

from transformers import pipeline
pipe = pipeline("text-generation", model="./hf_hub/models--lmsys--vicuna-7b-v1.5")
  • If you do not specify save_dir, it will be saved in the default storage path of the transformers library ~/.cache/huggingface/hub, and you can directly use the model name lmsys/vicuna-7b-v1.5 to call it.

4. Downloading datasets is similar, taking zh-plus/tiny-imagenet as an example:

python hf_download.py --dataset zh-plus/tiny-imagenet --save_dir ./hf_hub

5. If you do not want to use an absolute path when calling the model and do not want to store it in the ~./cache directory, you can set it up using a soft link, with the following steps:

(1) First, create a directory in any location to serve as the real storage location for downloaded files, for example:

 mkdir /data/huggingface_cache

(2) If the transformers library has already created a directory in the default location ~/.cache/huggingface/hub, you need to delete it first:

 rm -r ~/.cache/huggingface

(3) Create a soft link pointing to the real storage directory:

ln -s /data/huggingface_cache ~/.cache/huggingface

(4) After that, when you run the download script, you do not need to specify save_dir, and the files will be downloaded to the directory created in step one:

python hf_download.py --model lmsys/vicuna-7b-v1.5

(5) In this way, when calling the model, you can directly use the model name without needing to go through the storage path:

from transformers import pipeline
pipe = pipeline("text-generation", model="lmsys/vicuna-7b-v1.5")

To join the technical exchange group, please add the AINLP assistant on WeChat (id: ainlp2)

Please note your specific direction + related technical points

About AINLP
AINLP is an interesting AI natural language processing community, focusing on sharing technologies related to AI, NLP, machine learning, deep learning, recommendation algorithms, etc. Topics include LLM, pre-trained models, automatic generation, text summarization, intelligent Q&A, chatbots, machine translation, knowledge graphs, recommendation systems, computational advertising, recruitment information, and job experience sharing. Welcome to follow! To join the technical exchange group, please add the AINLP assistant on WeChat (id: ainlp2), noting your work/research direction + purpose of joining the group.


Leave a Comment