T-SQL Tuesday #11: The Misconception of Active/Active Clustering

TSQL Tuesday

October's TSQL2sDay: Hosted by Sankar Reddy

I was expecting my inaugural T-SQL Tuesday post to not happen this month… Until just yesterday, when I found myself helping explain to someone what an Active/Active SQL cluster was and what it wasn’t. Ah hah! I expected this to be a topic already covered fairly well, and as expected, I found a couple of really good articles/posts by others on this topic. I will reference those in a bit, but I can cover some more general ground first.

The misconception that was I was explaining yesterday morning is the same one that I’ve explained before and have seen others do the same: An Active/Active SQL Server Cluster is not a load-balancing system. It is, in fact, purely for High-Availability purposes. If what you are really after is true load-balancing, unfortunately, you will have to go elsewhere, to something like Oracle RAC.

When I say “true load-balancing”, I mean being able to load-balance a specific set of data across multiple pieces of hardware. What you can do with Active/Active clustering is load-balance a set of databases across multiple pieces of hardware. This is useful if you have a SQL Instance with multiple very heavily-utilized databases—one or more of these DBs could be pulled out to its own instance on separate hardware. This is not done without potentially serious tradeoffs, however, which will be covered in a bit.

The underlying reason for this is that Windows Clustering is only designed for HA purposes. With its “shared nothing” approach to clustering, any given resource within the cluster (say, the storage LUN that holds your accounting software’s MDF file) can only be controlled by one node of the cluster at a time. If the accounting DB’s home LUN is currently attached to the first node in a cluster, it’s going to be pretty hard for the second node to service requests for that database.

OK, what is Active/Active Clustering, then?

I like to think of it as two Active/Passive clusters that happen to live on the same set of hardware. To illustrate:

Active-Passive Cluster Example 1

This example cluster is a two-node Active/Passive cluster that contains a single SQL Server Instance, CLUS1\ACCOUNTING. This Instance has its own set of drive letters for its DB, Log, and TempDB storage (say, E:, F:, and G:). If both nodes are healthy, Node2 never has anything to do. It sits around twiddling its bits, just waiting for that moment when a CPU melts down in Node1, and it can take over running the ACCOUNTING instance to show how awesome it is.

Active-Passive Cluster Example 2

This is another two-node Active/Passive cluster, running a single instance, CLUS2\TIME (you know, because internal time tracking software is that important). Like CLUS1\ACCOUNTING above, it has its own set of drive letters for its storage needs (M:, N:, and O:)In this case, the SQL Server Instance is on Node2, leaving Node1 with nothing to do.

What do we get if we mashed these two clusters together?

Active-Active Cluster Example

We would get this, an “Active/Active” SQL cluster. It’s logically the same as the two previous clusters: it just so happens that instead of the second node doing nothing, it is running another independent instance. Each instance still has its own set of drive letters (E:, F:, and G: plus M:, N:, and O:). Just like in the “Accounting Cluster” above, if Node1 were to fail for some reason, the CLUS1\ACCOUNTING instance would be started up on Node2, this time, alongside CLUS2\TIME.

So, is doing this a good idea?

Maybe. Above, I talked about how a single instance Active/Passive cluster with more than one heavily-used DBs could be re-architected into an Active/Active cluster with some of the heavy DBs on one instance living on one node and the rest of them on a second instance living on the other node in the cluster. This gives you both High Availability and you’ve spread the load out between two servers, right? It does, but what happens if one of the nodes fails and the two instances “clown-car” onto the remaining node? Are there enough CPU & RAM resources on the remaining server to support the combined load?

That illustrates one of the major things to consider when thinking about deploying an Active/Active cluster. Tim Ford (Blog | @sqlagentman) has an article over at MSSQLTips that goes into detail about making this decision.

In addition, Aaron Bertand (Blog | @AaronBertrand) has a sweet little piece of automation to help manage resources when trying to squeeze the maximum amount or size of clowns into a single car without someone’s foot sticking out in the breeze. Of course, it’s something that you hope never actually gets used, but it’s a good way to wring the most performance out of an Active/Active cluster.

Conclusion

I don’t know if this is the best way to explain this concept, but I like it. The Active/Passive cluster concept is usually understood, so I believe it is a nice stepping stone to get to the truth about Active/Active clusters. As with everything, there are tradeoffs and gotchas that need to be considered when designing server architectures like this, and the above-mentioned others have gone a long way to cover those topics.

There we have it, my first T-SQL Tuesday post. Hopefully it is useful and not from too far out of left field 😉