Enterprise Class Storage Requirements from Mike Workman
Posted by Crystal Nichols on Mon, Nov 09, 2009 @ 08:53 AM
In case you haven't read Mike Workman's (CEO of Pillar Data) blog,take a look at this latest post "Homey don't play dat."
Here is a snippet from the post where he argues the point of exactly what can be considered
"Enterprise Class" Storage:
Let’s put down a few basic requirements of “Enterprise Class” that I would imagine we could all agree to:
- I’d say at least 4-9s' of data availability, perhaps in many
environments over 5. No single point of failure is necessary to achieve
this requirement. Regardless of the failure, it should have limited
consequences on performance or LUN access. Nobody claims a machine is
Enterprise class when it is sold in single-controller models; This is
just silly.
- Systems should not be designed to assume there is “idle time” in
every 24 hour window. Most Enterprises run 24x7, not 24x5, nor even
12x5. It is clear when rebuilds, “data progression”, LUN layouts are
stalled for days behind system load that they are not ready for the Enterprise.
- Non-disruptive upgrades. This doesn’t usually include upgrades to
new machines or new platforms, but code point-releases while system is
in operation.
- Failures during operation do not result in Filesystems or LUNs off
–line by design. Not that they don’t ever go offline, but that he
usually should not, and when they do it is a defect-not a design
attribute.
- Serviceability from the front or back of the rack. In other words,
field replaceable units do not involve pulling boxes out of racks and
opening lids, using a pair of pliers or a soldering iron (see…
hyperbole can be fun).
- Reliability, which means that the quality of components in the
system doesn’t necessitate constant intrusion in the data center.
Regardless of #1, it is never desirable to have someone pushing a
wheelbarrow of full of bad parts through the isles -- especially if it
takes days to rebuild and restore the system to an all-normal
condition, like in the case of many of our competitors.
A system which stops all I/O on a class of drives for 25 to 60
seconds when one drive fails is not Enterprise class. Yet there is a
company out there in which this characteristic is indeed always present
by design; a dumb design for the Enterprise to say the least. The
company is Compellent – or as I referred to them in my last post CML
(their ticker symbol, this wasn’t a slight any more than people calling
Sun “JAVA”). Here are some of the test results for a Compellent system
under I/O load. The figure below is a plot of IOPs after a drive
fails, versus the same thing on a similarly set-up Axiom. The IOPs were
normalized to 100% for the no-fault performance on both arrays, and
both had a similar number of spindles (both systems are current, not
old stuff, and the code revision level of the CML system tested was
4.2.3 ).
Personally, while I recognize we all make mistakes, we all have
flaws, and none of our systems are perfect, I believe that some of
statements and claims in the storage industry are just deceptive, or
reckless to say the least.
Read more about Pillar Data including the latests posts by Mike Workman by clicking here.