With the evolution of the networking world and the designs associated with it, redundancy and high availability€™ are terms that are quite often thrown around. But what do they really mean?

 

We are going to walk you through the evolution of redundancy by taking a look at Cisco’s Catalyst 6500 and also help you understand why it is integral for your network to achieve a redundant state.

 

As a network engineer, we all want to see a 100% uptime network – this means that no matter what happens, traffic via the network should not be impacted. For clients where money lost is significant even in minutes, redundancy is key. Let’s take a look at the following design as an introductory example:

 

akshay-blog-1-importance-of-redundancy-ha

 

The collapsed core consists of a single Catalyst 6500 with a single supervisor and several line cards. There are multiple access layer switches that the end users connect to. For an end user, the only way to reach the Internet (or the Intranet) is to go through the 6500.

What happens if the supervisor goes down? Or for some reason, the chassis goes down? You will be looking at a complete black hole of traffic and your network is a 100% non-functional.

dual-sup-ha

 

This is a typical example of a non-redundant network. The Catalyst 6500 evolved to address this by introducing the capability to have two supervisors in the same chassis – if one goes down, the other can take over and your network can continue functioning after a small period of traffic loss (this period is the time take for the secondary supervisor to take over after the primary supervisor fails).

 

 

As you can see, we have achieved redundancy in terms of the supervisor engines. But what happens if there is a problem with the chassis itself and it goes down? You again reach a state of complete network blackout. A new concept called the VSS (Virtual Switching Systems) was introduced to address this. With this, you can have redundancy in terms of chassis as well.

 

With a VSS design, you have two Catalyst 6500s (with each chassis having a single supervisor) acting as a single virtual switch for the rest of the network. An example of this design:

 

vss

 

As you can see, with a VSS, your design includes two physical 6500 chassis’ that act as a single switch to the rest of the network. The access layer switches should be port-channeled to the VSS. With this, you are also aggregating links at the access layer (and thus providing more bandwidth for traffic). This entire design provides several layers of redundancy – you have chassis redundancy and you have redundancy at the link level as well (in case one of the links in the port-channel goes down, you still have other links that can forward traffic). Traffic flow, based on the hash of the port-channel, can go either via chassis 1 or chassis 2 of the VSS.

 

vss-traffic-flow

 

If you carefully understand this design, you will realize that there is still scope for improvement. This design provides chassis redundancy, but what would happen to devices that are connected to only one chassis and not both? Such ports are called orphan ports and if the chassis that they are connected to goes down, then these ports have no other path to forward traffic.

 

Taking this into account, to make this VSS design even more resilient, the quad-sup VSS design was introduced. This means you have four supervisors in total – two per chassis. Each chassis has an active supervisor that is local to the chassis and the second supervisor in the chassis is the in-chassis standby. At any point, if the active supervisor of a chassis were to fail, the in-chassis standby would take over as the new active supervisor for that chassis.

 

The quad-sup VSS design is the ultimate redundancy design – it provides seamless transition from supervisor to supervisor and chassis to chassis during various failure scenarios.

 

We’ll end the post with this note that not every network requires a very high degree of redundancy. As a network administrator, you need to understand the requirements of your network and if it needs a hitless infrastructure. Based on that knowledge, any of the above designs can be used as baselines to guide you to building a more redundant network.

 

Hope this was helpful !

Aninda