Survivability
A much overlooked point in organization operational behaviour is the survivability of its systems
in the event of major breakdown. In a world where failure if for nothing else is still statistically
likely to eventually happen does your organization know how to react? Do you have Dissaster Recovery Plans in
place? How will the failure of one part of your system effect the rest? How
major a failure does it have to be to be catastrophic? How fast can you come back?
Survivability goes far beyond service availability and addresses the very fundamental overal infrastructure
which your organization is dependent on. Limited availability will mean loss of finances
and user loyalty, limited-survivability will mean loss of the company!
Organzations need to realize that the failure of a simple section of a large integrated system can result in the total collapse of its functionality through a cascading effect. What seems like miniscule failures can spill over to other interconnected systems increasing in severity exponentially and create havoc within the organization as well as partner and client systems.
Survivability is a critical factor for all strategic investments where downtime is reason for large financial or other losses. Analysis of this kind requires extensive understanding of mathematical computer modeling, risk analysis and distributed computing concepts. Results of Survivability studies reveal how vulnerable organizations are against cascading effects of minor failures in critical modules. They also make specific suggestions aimed at increasing overall operations survivability in the occurence of such event.