Steve Jenson's blog

Scaling data on the cheap

Scaling data on the cheap

Scaling Data on the Cheap by Ryan Barrett from Google. A few things to note:

Don't shard by hash because when you have to reshard, you have to move ALL of your data. Ideally, you can add shards without rebalancing.

Use bloom filters to help you know which shard your data is contained in without hitting the shard. That technique is also used in BigTable (see Section 6 of the recently published BigTable paper).

# — 10 September, 2006