"So what happens if…"
......nobody can get in to our office?
......the power goes out in our building?
......there is a flood?
......the datacentre is ransacked?
......our network is cut?
Power companies, telcos, and data centre operators all strive for 100% uptime, their marketing literature is littered with phrases like resilience, N+1, self-healing, and high availability. What this really means, is that they have accepted that equipment is fallible and that people make mistakes. They have considered what we call modes of failure in their service.
"So how come the redundant, self-healing, service went down twice this quarter?"
Unfortunately, failures in IT can be both subtle and complex. The interaction of so many intricate systems, hardware, software, and networks, give rise to an almost infinite number of ways in which things can go wrong. Sometimes, somehow, failures in a complex system can confound even the most prepared service providers. If you have not already experienced an outage of this sort, you may be surprised to learn that downtime due to something as simple as a power cut can run into many days. In the last 14 months there have been at least 7 major power incidents in the UK. The longest saw homes and businesses in London without power for 4 days. What is your plan for coping without power, networking, or staff on site for almost a week? It happens.
Experienced systems administrators and network engineers know that sometimes, adding additional software or hardware to an already complex system with the intention of achieving higher uptime, can sometimes have quite the opposite effect. At 360is we come into contact with far more misconfigured multi-path storage fabrics than we do faulty cables or broken network cards. Why? Because complexity is the enemy of availability. Additional network cards, cables, and even switches are relatively inexpensive. Expertise to make them all work properly is not.
"If you don't have it twice, you don't have it"
As a response to the fallibility of complex systems, IT infrastructure managers have long sought to replicate their data and applications both within a given data centre and further afield to a secondary Disaster Recovery site. While this setup sounds impressive, it need not be costly. 360is replicates it's Manchester systems to just a few units of rack space in Munich
Once replication across geographic distances was prohibitively expensive for all but large financial institutions (who coupled mainframes in one location to remote disk drives in another), today it is within the reach of even the Small to Medium Enterprise.
Today's systems manager will find there are now many paths to replication, each with their own pros and cons in terms of price, functionality, level of integration, ease of deployment and ease of use.
- Storage replication, either at the file, or block level, under the application.
- Application clustering, either as a function within the app itself or an after-market wrap-around with the added bonus of automatic failover.
- VM fault tolerance, software analogous to the "dual everything" FT hardware approach, never lose a bit.
- Increase Availability
- Reduce Complexity
- Make the better use of your capital and operating budget
No comments:
Post a Comment