Public datasets for your next LLMs, AI, and machine learning.

Aman Kumar
1 min readFeb 24, 2025

--

Here are 5 Free data sources where you can find one for your next project:

1. Awesome Data Github Repository You can find direct links to all the public datasets, including images, text, audio, and tabular data.

2. Hugging Face Find 200K+ datasets on Hugging Face for various modalities.

3. Open ML Find 23K+ datasets on Open ML

https://openml.org/search?type=data&sort=runs&status=any

4. Papers with Code Papers with Code consist of 10K+ Public Datasets on different modalities. Check this out:

5. LLMDataHub This repo contains the collection of high-quality training corpora for LLMs in the open-source community.

--

--

No responses yet