Downloading and Uploading Huggingface Large Models

Downloading and Uploading Huggingface Large Models

Downloading

Assuming we need to download the <span><span>Qwen2.5-0.5B-Instruct</span></span> model from Huggingface.

1. Using git lfs

Git LFS is an extension developed by GitHub to support large files in Git.

Installation on Mac:

brew install git-lfs

Installation on Linux:

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git lfs install

Download the large model:

git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct

Assuming you have the hostname and port number of a SOCKS proxy server, use the following commands in the command line to configure Git global proxy settings:

git config --global http.proxy socks5://proxy.example.com:1080
git config --global https.proxy socks5://proxy.example.com:1080

If the proxy server requires username and password authentication, you can include the username and password in the command:

git config --global http.proxy socks5://username:[email protected]:1080
git config --global https.proxy socks5://username:[email protected]:1080

2. Using huggingface-cli

Install dependencies:

pip install -U huggingface_hub

Set environment variable:

export HF_ENDPOINT=https://hf-mirror.com

It is recommended to write this line into <span><span>~/.bashrc</span></span>

Download the model:

huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir qwen2.5-0.5b-chat

Download a single file:

huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct config.json --local-dir qwen2.5-0.5b-chat

For private or gated model repositories, you must use a token for access:

huggingface-cli download --token hf_*** meta-llama/Llama-2-7b-hf --local-dir Llama-2-7b-hf

Access Tokens can be obtained from <span><span>https://huggingface.co/settings/tokens</span></span>.

3. Using wget

Use <span><span>wget</span></span> to download files one by one:

mkdir qwen2.5-0.5b-chat
cd qwen2.5-0.5b-chat/
wget https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/config.json
wget https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/generation_config.json
wget https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/merges.txt
wget https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/model.safetensors
wget https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/tokenizer.json
wget https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/tokenizer_config.json
wget https://hf-mirror.com/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/vocab.json

Since files need to be downloaded one by one, it might be cumbersome, but using <span><span>wget</span></span> is relatively fast.

Additionally, if you need to download directly from <span><span>https://huggingface.co</span></span>, you can access it using a SOCKS proxy after installing <span><span>tsocks</span></span>:

sudo apt update
sudo apt install tsocks

Edit <span><span>/etc/tsocks.conf</span></span> to include the proxy server address and port:

server = 10.41.27.53
# Server type defaults to 4 so we need to specify it as 5 for this one
server_type = 5
# The port defaults to 1080 but I've stated it here for clarity 
server_port = 1349
tsocks wget https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct/resolve/main/README.md

Uploading

First, apply for a token with <span><span>WRITE</span></span> permissions from <span><span>https://huggingface.co/settings/tokens</span></span>, then log in:

huggingface-cli login

1. Using huggingface-cli

Set environment variable:

export HF_ENDPOINT=https://hf-mirror.com

It is recommended to write this line into <span><span>~/.bashrc</span></span>

Upload all files in the current folder to the root directory of <span><span>WangZejun/my-cool-model</span></span>:

huggingface-cli upload my-cool-model . .

Upload all files in a specific local folder to the root directory of <span><span>WangZejun/my-cool-model</span></span>:

huggingface-cli upload my-cool-model ./qwen2.5-0.5b-chat .

Upload all files in a specific local folder to the directory <span><span>qwen2.5_0.5b_chat</span></span> of <span><span>WangZejun/my-cool-model</span></span>:

huggingface-cli upload my-cool-model ./qwen2.5-0.5b-chat /qwen2.5_0.5b_chat

Upload a single file to the root directory of <span><span>WangZejun/my-cool-model</span></span>:

huggingface-cli upload WangZejun/my-cool-model ./qwen2.5-0.5b-chat/README.md

Upload a single file to the <span><span>temp</span></span> directory of <span><span>WangZejun/my-cool-model</span></span>:

huggingface-cli upload WangZejun/my-cool-model ./qwen2.5-0.5b-chat/vocab.json temp/vocab.json

2. Using git

1. First, create a new model repository on your personal Huggingface homepage named <span><span>my-cool-model</span></span>

2. Initialize in the local folder where you want to upload:

git init

3. Use <span><span>git lfs</span></span> to track large files:

git lfs track *.safetensors

If any file is larger than 5G, you need to:

huggingface-cli lfs-enable-largefiles .

4. Add remote repository:

git remote add origin https://hf-mirror.com/WangZeJun/my-cool-model

5. Pull the remote repository to local:

git pull origin main

There may be a merge conflict (mainly in the .gitattributes file); deleting the local .gitattributes and then pulling will resolve it.

6. Switch to the <span><span>main</span></span> branch, add local files and commit:

git checkout main
git add .
git commit -m "commit message"

7. Push to remote:

git push origin main

Enter your username and Access Tokens to complete the upload.

To join the technical exchange group, please add the AINLP assistant WeChat (id: ainlp2)
Please specify your specific direction and related technical points.

About AINLP
AINLP is an interesting AI natural language processing community focused on sharing technologies related to AI, NLP, machine learning, deep learning, recommendation algorithms, etc. Topics include LLM, pre-trained models, automatic generation, text summarization, intelligent Q&A, chatbots, machine translation, knowledge graphs, recommendation systems, computational advertising, recruitment information, job experience sharing, etc. Welcome to follow! To join the technical exchange group, please add the AINLP assistant WeChat (id: ainlp2), specifying your work/research direction and purpose of joining the group.

Leave a Comment