If your enterprise network has only one MX, one web server, one firewall, and one data storage server serving clients around the world, it is time for a cold, hard look at how to immediately transform your network for business continuity.
MSCS (Microsoft Cluster Service) and VCS (Veritas Cluster Services) both evolved from technology originally developed by Digital Equipment Corporation in the 1980’s for use on their VAX/VMS platform. Both of these products, at a minimum, offer the ability to “fail over” services, applications and File and Print resources (using a Virtual Server “gratuitous re-ARP”, the same mechanism employed by Cisco routers running HSRP and by teaming network cards configured in failover mode). By “failing these resources over” from one cluster Node to another, these resources “remain available” to clients and users (thus the term “High-Availability” clustering is sometime associated with this architecture). Before we go much further, it would be wise to take a minute and understand the limitations of this technology. Neither of these products truly support “load balancing” clusters.
Load Balancing clustering architectures, which can be either hardware based (as in Cisco Local Director, Big IP F5 LoadBalancer, etc.) or software based (as in Microsoft WLBS or NLB), divide up the incoming requests among multiple cluster hosts, allowing work to be distributed among many machines simultaneously. This ability to “scale out” (by adding additional hosts to the cluster as processing demands increase) is what differentiates a load balancing cluster from a failover cluster. In failover cluster architecture, resources needed to support the processing of a single application, service or File/Print resource are organized under “Resource Groups” and these resources are generally accessed by means of a unique Virtual Server name. Once configured, this resource group can live on any one node of the cluster or another, but never on two nodes simultaneously. MSCS and VCS do both support Active/Active configurations of failover clustering, where processing can occur on multiple cluster nodes simultaneously, but in this configuration, the work being done by these disparate cluster nodes is truly different work (again organized under separate resource groups and accessed through unique Virtual Server names). For example, we could configure one set of Exchange resources (resources needed to support a single installation of Exchange Server, such as physical disks, Information Store, Routing Engine, SMTP, etc.) under one resource group named Exchange1 and then configure another set of Exchange resources under a separate resource group named Exchange2. If we then configured the preferred owner of Exchange1 to be NodeA and the preferred owner of Exchange2 to also be NodeA, both Groups would, by default, be hosted on NodeA (unless NodeA failed of course in which case they would fail over to NodeB). This would be considered an Active/Passive configuration. Alternately however, we could configure the preferred owner of Exchange1 to be NodeA and configure NodeB to be the preferred owner of Exchange2 and this would be known as an Active/Active configuration. Under nominal conditions, Exchange1 would run on NodeA and Exchange2 would run on NodeB. If Node A failed, Exchange1 would fail over to NodeB, but now, NodeB would have to run both Exchange1 and Exchange2 and the net effect will be that both servers will be slowed to running at “half speed”. Alternately, if NodeB failed and NodeA survived, Exchange2 would then fail over to NodeA resulting in the same diminished performance condition for both resource groups. Generally, we use Active/Passive configurations when the primary design consideration is the speed of the application after failover (If the NodeA server hardware is identical to the NodeB server hardware and there are no other applications already running on that NodeB server hardware, the failed over application (resource group) will run just as fast on NodeB as it used to be running on NodeA). We use Active/Active clusters where the primary design consideration is the highest overall utilization of our server hardware (NodeB can run NodeA’s applications at half speed if necessary and vice-versa, but 99+% of the time, NodeA and NodeB are running their applications full speed).
Increasing the availability of essential network services and mission critical LOB applications and data is not the only role which is fulfilled by MSCS and Veritas clustering. Failover clusters also offer “rolling upgrade” support and the flexibility of virtually unlimited hardware/software maintenance windows. After clustering has been set up and implemented, an administrator can simply fail over all the applications and services running on one Node to another Node of the cluster, and once this is done, proceed to take that node down for hardware upgrades, OS upgrades, Service Pack upgrades, application upgrades, etc. Once the first Node has been upgraded and rebooted, one can simply fail the applications back on to the upgraded Node, and take the unupgraded node down for similar maintenance. Once the second node has been upgraded, all the administrator need to do is to rebalance the work running on both cluster nodes (if this is an Active/Active configuration) and they would be done. This decreases TCO by allowing software/hardware upgrades to be performed in the middle of the day, as opposed to scheduling all those grueling maintenance windows at 3am on Sunday mornings…
Norm Hebert is employed as the chief Windows Network Architect and Server and Storage Virtualization Specialist by Certified Network Consultants of Nashua NH. Mr. Hebert has 6+ years of experience with MSCS, is fully Microsoft and VMware certified and can be reached at:
Norm@CertifiedNC.com.