Member-only story
Sharding Models: Slicing Giants into Dancing Fairies
In the sacred scrolls of Computer Science, there is a golden proverb:
“If a task is too heavy for one, divide it among many.”
And so, when today’s Large Language Models — veritable Titans — grew so large they could no longer fit into the humble GPU memory of mere mortals, the ancient art of Sharding was reborn.
Pull up a chair, sip some masala chai, and let us embark upon the enchanting journey of How to use Sharding in Models. 🚀
🌟 What is Sharding?
Imagine you baked a cake so gigantic that no single plate could hold it.
Would you weep? Nay! You would cut it into slices and pass it around.
Sharding is exactly this:
Splitting a large model across multiple devices (or nodes or GPUs) so that each holds a small, manageable piece.
Each shard holds a part of the model’s parameters, and together, they whisper secrets to each other across the network to behave like one single mighty model.