Improving Society Through Innovation

9,985,823 - Method and system for mitigating correlated failure modes

Abstract:

A system is provided for mitigating partially correlated failure modes to increase application availability. The system includes a plurality of nodes connected by a computer network, each node configured to run an instance of the same application, a failure analysis engine configured to maintain current availability statistics for the nodes of the system, calculate current mean time to failure (MTTF) for the system as a function of the current time from the availability statistics, and compare the current MTTF to a plurality of threshold values, each of the threshold values corresponding to one or more actions to be taken to increase application availability, and a failure prevention engine that performs the one or more actions to increase application availability.