ID: Hazelnut
Public Account: Python and Algorithm Community
We often encounter the problem of where to download data. You must have struggled to find the data you want, as I often spend considerable effort looking for data. Recently, I specifically searched for it, and the following links can all be opened normally.
1. Agriculture Related
https://www.plants.usda.gov/dl_all.html
2. Biology Related
1000 Genomes: http://www.internationalgenome.org/data
Cell Image Library, 10,000 datasets, 2T data: http://www.cellimagelibrary.org/home
Cancer Cell Encyclopedia (CCLE): https://portals.broadinstitute.org/ccle
3. Weather
World Climate: http://www.worldclim.org/
Weather Around the World Since 1929: https://en.tutiempo.net/climate
4. Geography
Earth Related: http://www.earthmodels.org/
Countries Around the World: https://github.com/mledoze/countries
Open Source Map OSM: https://www.openstreetmap.org/
Map Data: http://www.naturalearthdata.com/
5. Health
Health Big Data: https://www.ehdp.com/vitalnet/datasets.htm
World Health Organization: https://www.who.int/gho/en/
6. Web Data
Paper Citation Relationship Dataset: https://www.aminer.cn/citation
Exhaustive Password Dictionary: https://github.com/duyetdev/bruteforce-database
7. Economics
Our World in Data: https://ourworldindata.org/
Data Center: https://cid.econ.ucdavis.edu/
World Company Directory: https://opencorporates.com/
8. Image Processing
ImageNet: http://www.image-net.org/
Animal Emotion: http://www.imageemotion.org/
YouTube Face Recognition: http://www.cs.tau.ac.il/~wolf/ytfaces/
Indoor Scene Recognition: http://web.mit.edu/torralba/www/indoor.html
Dog Dataset: http://vision.stanford.edu/aditya86/ImageNetDogs/
Face: https://talhassner.github.io/home/projects/Adience/Adience-data.html
Face: http://vis-www.cs.umass.edu/lfw/
9. Machine Learning
eBay Online Trading Data: http://www.modelingonlineauctions.com/datasets
Internet Movie Database (IMDB): https://www.imdb.com/interfaces/
KEEL Dataset: https://sci2s.ugr.es/keel/datasets.php
Database for Machine Learning: http://mldata.org/
Music Dataset: http://millionsongdataset.com/
UCI Dataset Repository (473 datasets): http://archive.ics.uci.edu/ml/index.php
10. Natural Language Processing
Blog Corpus: http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm
CLiPS Stylometry Investigation Corpus:
https://www.clips.uantwerpen.be/datasets/csi-corpus
Google Books Ngrams: https://aws.amazon.com/datasets/google-books-ngrams/
Machine Translation: http://statmt.org/wmt11/translation-task.html#download
11. Community Networks
GitHub Archive: https://www.gharchive.org/
Google Scholar Citation Relationships: http://www3.cs.stonybrook.edu/~leman/data/gscholar.db
All the datasets above can be opened on the intranet without the need for a VPN, and I hope this helps you. If you know more datasets, feel free to list them in the comments to share with everyone.
For more open-source datasets
GitHub: https://github.com/jackzhenguo/awesome-public-datasets