Recommended Machine Learning Datasets

ID: Hazelnut

Public Account: Python and Algorithm Community

We often encounter the problem of where to download data. You must have struggled to find the data you want, as I often spend considerable effort looking for data. Recently, I specifically searched for it, and the following links can all be opened normally.

1. Agriculture Related

https://www.plants.usda.gov/dl_all.html

Recommended Machine Learning Datasets

2. Biology Related

1000 Genomes: http://www.internationalgenome.org/data

Recommended Machine Learning Datasets

Cell Image Library, 10,000 datasets, 2T data: http://www.cellimagelibrary.org/home

Recommended Machine Learning Datasets

Cancer Cell Encyclopedia (CCLE): https://portals.broadinstitute.org/ccle

Recommended Machine Learning Datasets

3. Weather

World Climate: http://www.worldclim.org/

Recommended Machine Learning Datasets

Weather Around the World Since 1929: https://en.tutiempo.net/climate

Recommended Machine Learning Datasets

4. Geography

Earth Related: http://www.earthmodels.org/

Recommended Machine Learning Datasets

Countries Around the World: https://github.com/mledoze/countries

Recommended Machine Learning Datasets

Open Source Map OSM: https://www.openstreetmap.org/

Recommended Machine Learning Datasets

Map Data: http://www.naturalearthdata.com/

Recommended Machine Learning Datasets

5. Health

Health Big Data: https://www.ehdp.com/vitalnet/datasets.htm

World Health Organization: https://www.who.int/gho/en/

Recommended Machine Learning Datasets

6. Web Data

Paper Citation Relationship Dataset: https://www.aminer.cn/citation

Recommended Machine Learning Datasets

Exhaustive Password Dictionary: https://github.com/duyetdev/bruteforce-database

Recommended Machine Learning Datasets

7. Economics

Our World in Data: https://ourworldindata.org/

Recommended Machine Learning Datasets

Data Center: https://cid.econ.ucdavis.edu/

Recommended Machine Learning Datasets

World Company Directory: https://opencorporates.com/

Recommended Machine Learning Datasets

8. Image Processing

ImageNet: http://www.image-net.org/

Animal Emotion: http://www.imageemotion.org/

Recommended Machine Learning Datasets

YouTube Face Recognition: http://www.cs.tau.ac.il/~wolf/ytfaces/

Recommended Machine Learning Datasets

Indoor Scene Recognition: http://web.mit.edu/torralba/www/indoor.html

Recommended Machine Learning Datasets

Dog Dataset: http://vision.stanford.edu/aditya86/ImageNetDogs/

Face: https://talhassner.github.io/home/projects/Adience/Adience-data.html

Recommended Machine Learning Datasets

Face: http://vis-www.cs.umass.edu/lfw/

Recommended Machine Learning Datasets

9. Machine Learning

eBay Online Trading Data: http://www.modelingonlineauctions.com/datasets

Internet Movie Database (IMDB): https://www.imdb.com/interfaces/

KEEL Dataset: https://sci2s.ugr.es/keel/datasets.php

Database for Machine Learning: http://mldata.org/

Music Dataset: http://millionsongdataset.com/

UCI Dataset Repository (473 datasets): http://archive.ics.uci.edu/ml/index.php

Recommended Machine Learning Datasets

10. Natural Language Processing

Blog Corpus: http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm

CLiPS Stylometry Investigation Corpus:

https://www.clips.uantwerpen.be/datasets/csi-corpus

Google Books Ngrams: https://aws.amazon.com/datasets/google-books-ngrams/

Machine Translation: http://statmt.org/wmt11/translation-task.html#download

11. Community Networks

GitHub Archive: https://www.gharchive.org/

Google Scholar Citation Relationships: http://www3.cs.stonybrook.edu/~leman/data/gscholar.db

All the datasets above can be opened on the intranet without the need for a VPN, and I hope this helps you. If you know more datasets, feel free to list them in the comments to share with everyone.

For more open-source datasets

GitHub: https://github.com/jackzhenguo/awesome-public-datasets

Recommended Machine Learning Datasets

Leave a Comment