Public datasets for your next LLMs, AI, and machine learning.
Here are 5 Free data sources where you can find one for your next project:
1. Awesome Data Github Repository You can find direct links to all the public datasets, including images, text, audio, and tabular data.
2. Hugging Face Find 200K+ datasets on Hugging Face for various modalities.
3. Open ML Find 23K+ datasets on Open ML
https://openml.org/search?type=data&sort=runs&status=any
4. Papers with Code Papers with Code consist of 10K+ Public Datasets on different modalities. Check this out:
5. LLMDataHub This repo contains the collection of high-quality training corpora for LLMs in the open-source community.