Back to Stories

Hugging Face reaches 1 million datasets on their platform



May 12, 2026 - 1 min read

The Hugging Face platform just reached 1 million datasets.

This matters beyond a superb milestone. Open datasets are a core infrastructure of the AI economy. When data is shared openly, researchers, startups, universities, and companies can build on the same foundations, reproduce results, benchmark progress, and iterate faster

The fastest-growing category is now Robotics & Reinforcement Learning. LLMs were trained primarily on internet-scale text and images: systems optimized to predict the next token.

But embodied AI requires something fundamentally different. Robots learn from demonstrations, trajectories, sensor streams, feedback loops, and interaction with the physical world. The frontier is shifting from modeling language to modeling action.

The most liked datasets include a datasets of prompts for generative AI, Hugging Face's FineWeb collection of open data and Anthropic's Reinforcement Learning from Human Feedback and red teaming dialogues data.

Congratulations to everyone building in the open!


Scan the QR code to view this story on your mobile device.


Open Sourcedatasets