Tim> My first experiment was simply setting up a distributed volume
Tim> that needs to have each of the "bricks" as they refer to them as
Tim> being reliable members.
Nice.
Tim> However you can configure it to mirror and strip across the
Tim> "bricks" so they don't have to be. In fact you can setup
Tim> stripped mirrors, etc.
That would be safer for sure. I can just see a network problem taking
down a bunch of bricks and leading to a split-brain situation if
you're not careful.
Tim> I've now got a number of these storage nodes as I call them. It
Tim> generally consists of a 1U box, stuffed full of ram, connected to
Tim> a JBOD chassis full of disks. I then use ZFS to create storage
Tim> pools.
So I used to like ZFS and think it was "da bomb" but after using it
for several years, and esp after using the ZFS appliances from
Sun/Oracle, I'm not really enamored of the entire design any more.
I think they have some great ideas in terms of checksumming all the
files and metadata. But the layering (or lack of) disks/devices
inside zpools just drives me nuts. It's just really inflexible and
can get you in trouble if you're not careful.
Tim> The problem with this is that your scale is limited to the size
Tim> of a single "node" and while you can play games with autofs it's
Tim> not a cohesive filesystem.
Can you give some more details of your setup?
Tim> This solves that problem. My only complaint is that it's fuse based.
Yeah, not really high performance at the end of the day.
Tim> Lustre is a kernel loaded filesystem that'll do the same thing as
Tim> gluster. However, it doesn't have any of the redunancy stuff,
Tim> it simply assumes your underlying storage is reliable.
This is the key/kicker for all these systems.
I'm waiting for someone to come up with an opensource sharding system
where you use erasure codes for low-level block storage so that you
have alot of the RAID6 advantages, but even more reliability AND
speed. But it's also a hard hard problem space to get right.
Thanks for sharing!
John