Joyent, ZFS and the All Disk DR Plan

I love Joyent: they are the most open hosting company I have found. I also have a fascination for a company that uses Solaris, ZFS, OpenSolaris, Rails, F5 and DTrace.

Following Joyent’s products, technologies and even employee activities is a great way to see how to scale complex services using bleeding edge approaches. They have a podcast and talk about their practices, discoveries and product opinions.

Joyent had a cleverly crafted storage service using a Sun X4500 “Thumper” system running OpenSolaris. A Thumper can provide 24 Terabytes of storage and ZFS has some great ideas for reducing the complexity of Storage Management… For months the service was rock solid. But on Jan 12 they were hit by what their CEO termed a “perfect wave”… an event I would call a “Black Swan”. An unplanned event that was totally unexpected.

When trying to move a ZFS pool of storage using the command “zpool import” the sys admin staff experienced a system panic…. the ZFS software halted to avoid data corruption.

They investigated the OpenSolaris forum and discovered that the panic matched a big that was fixed. Great, there’s a fix. But to implement the “patch” they needed to reinstall OpenSolaris with a newer version. Oh, oh. They have Terabytes of customer data and they need to re-build the whole system (OS, ZFS, etc) to fix the bug.

The secret of system availability if to always have at least two of any critical component: in this case they needed a second Thumper… They have one so they needed to:

  1. Copy the files from production Thumper to back-up Thumper. Several terbytes of data to be transferred across the ethernet. 15 hours per terabyte. Tic-tick-tick.
  2. Then re-build Thumper #1.
  3. Copy the files back to the patchd Thumper 5 hours, yada, yada.
  4. Then patch the backup Thumper? Probably.

Anyway. I see many commercial accounts being sold and buying into a totally disk-based architecture. Typically, using new software or systems that implement a “Virtual Tape Library”. All the data ends up on disks and doesn’t get sent off-site. The benefits are reduced costs and improved data transfers between devices. You just need to be sure you have a lot of disks to prevent any loss of data and you need to be sure the software involved is rock solid.

I predict the hot new phrase for the year just may end up being:

Black Swan: a large-impact, hard-to-predict, and rare event beyond the realm of normal expectations.

No one can afford to plan for the unexpected. It’s always a statistical effort to move towards a perfect plan. And the Black Swan is that which you cannot foresee. That which you think you’ve already accounted for: like the one person that has the skills to fix the problem and didn’t document anything and gets hit by the bus just before the system did something it never did before.

For Joyent that person is probably Ben Rockwood.

Ben announces today that he has joined the OpenSolaris Governing Board. Hopefully, he’ll help OpenSolaris improve, scale and become more widely adopted. It would certainly make my life easier. Learning new systems is harder as you get past a certain age: not that I don’t try to learn it’s just that I seem to have storage bugs. Go figure.