Sitemap

Member-only story

Sharding Models: Slicing Giants into Dancing Fairies

4 min readApr 28, 2025

In the sacred scrolls of Computer Science, there is a golden proverb:

“If a task is too heavy for one, divide it among many.”

And so, when today’s Large Language Models — veritable Titans — grew so large they could no longer fit into the humble GPU memory of mere mortals, the ancient art of Sharding was reborn.

Pull up a chair, sip some masala chai, and let us embark upon the enchanting journey of How to use Sharding in Models. 🚀

🌟 What is Sharding?

Imagine you baked a cake so gigantic that no single plate could hold it.
Would you weep? Nay! You would cut it into slices and pass it around.

Sharding is exactly this:

Splitting a large model across multiple devices (or nodes or GPUs) so that each holds a small, manageable piece.

Each shard holds a part of the model’s parameters, and together, they whisper secrets to each other across the network to behave like one single mighty model.

🛕 Why Shard a Model?

--

--

Aditya Mangal
Aditya Mangal

Written by Aditya Mangal

Tech enthusiast weaving stories of code and life. Writing about innovation, reflection, and the timeless dance between mind and heart.

No responses yet