Understanding the difference between resilience and redundancy to ensure uptime and business continuity

The distinction between network resilience and network redundancy may remain puzzling for many businesses, but the importance of understanding such differences is absolutely essential. While a resilient network may contain some redundancy, a redundant system isn’t always resilient to a competent standard.
Redundancy is a process through which alternate or additional instances of network devices, utilities and equipment are installed within the network infrastructure
and other elements such as backup generators, or alternate cooling circuits are brought in to support the smooth operation of the network.

Typically, a redundant network duplicates critical elements and devices that keep the network running, so that if one path fails, another can be used. That’s fine as far as it goes, but it doesn’t solve the problem of business continuity – far from it. After all, if there’s a primary network failure or something goes wrong with any piece of equipment other than the redundant elements, the network remains down.

Just adding switches or routers won’t resolve this issue. If an engineer cuts through a cable, the network may go down no matter how much duplicate equipment is in place. Redundancy can often be expensive too. Unsurprisingly, organisations often baulk at spending large sums on data connections that will be idle most of the time.

Maximising uptime with resilience

If a business is serious about maximising network uptime, it has to go beyond redundant equipment. That’s where end to end resilience is so important. Resilience is all about recovering quickly to ensure that the company is operating normally soon after a network outage. Part of this is knowing there’s a problem in the first place. Many organisations today face issues in being able to quickly identify and remediate reliability or resilience issues. Again, redundancy on its own won’t deliver this awareness, but resilience can. Take a large organisation with a Network Operations Centre. They may have lots of offices around the world with attendant time zone issues. As a result, they may struggle to learn that an outage has even occurred because they’re not proactively notified if something goes offline. Even when they are aware, it may be difficult to understand which piece of equipment, at which location, has a problem if no one is onsite to physically check.

Dealing with outages

True network resilience is not just about providing resilience to a single piece of equipment, whether that be a router or core switch, for example. In a global economy, it’s important that any such solution can plug into all of the equipment at any data centre or edge site, map it and establish what’s online and offline at any given time.

This enables a system reboot to be quickly carried out remotely. And if that doesn’t work, it might well be that an issue with a software update that’s the root of the problem. With the latest smart out-of-band (OOB) management devices this can be readily addressed because an image of the core equipment and its configuration can be retained, and the device quickly rebuilt remotely without the need for an engineer visit. In the event of an outage, therefore, it’s possible to deliver network resilience via failover to cellular, while the original fault is remotely addressed, enabling business continuity even while the primary network is down. Building in resilience through the OOB approach is expensive, but it’s money well spent. You might use this alternate access path infrequently but when you need it, you really need it. Moreover, resilience is typically far cheaper than having to buy in large volumes of redundant equipment, for example. This is increasingly the case as the deployment of edge locations increases. An organisation may be able to afford redundancy at a core data centre, powering multiple businesses and processes, but that same redundancy can’t be built into every single data rack or data closet at a small remote location.

Maintaining continuity
So, network redundancy can help businesses mitigate the risk of unplanned outages and help ensure business continuity, but it doesn’t necessarily bring resilience. Simply implementing redundant equipment will never ensure that a business can get its full network ecosystem from core to edge up and running normally again quickly. Ultimately, it’s having that resilience in place that’s key to businesses. After all, networks are the fundamental backbone to the success of organisations today, and many businesses will benefit from bringing network resilience into the heart of their approach from the very outset.