Monthly Archives: October 2010

Maintenance Plans: They don’t like it when DBAs leave

Discovered some…interesting…information about Maintenance Plans a few weeks ago when we had a DBA leave. I don’t think it warrants being called a “bug”, but it is behavior that I don’t think is ideal, and it definitely can cause issues for us and others. The good news here is that I did get a workaround nailed down (thanks to the help of Random Internet Guy), although we haven’t decided whether or not we’re going to use it.

How this all started

Recently, we had one of our DBAs leave. (Incidentally, this is the first time ever that I’ve had this happen to me and also the first time one has left this company.) After he left at the end of his last day, I pulled all of his access to the Production systems. We knew that any Agent jobs that he happened to own would start failing, and sure enough, there were a few failures that evening and over the course of the weekend.

One of these failures turned into a little more of an involved situation: The Maintenance Plan on the only production SQL 2008 box he set up failed. With us being used to SQL 2000, this was expected to be an easy fix and life would go on. Except, it wasn’t.

Toto, I’ve a feeling this isn’t SQL 2000 any more

First attempt to fix this was to change the owner of the SQL Agent Job to ‘sa’. He’s our standard job owner for consistency and because that makes the job impervious to people leaving. The job still failed, although the error was different this time. Obviously the Agent Job owner had some role in all of this, but it wasn’t the whole story, as things weren’t fixed.

After changing the Job owner, this is the error that occurred:

Executed as user: <SQL Service acct>, [blah blah blah]  Started:  11:25:38 AM  Error: 2010-10-06 11:25:39.76     Code: 0xC002F210     Source: <not sure if this GUID is secure or not> Execute SQL Task     Description: Executing the query “DECLARE @Guid UNIQUEIDENTIFIER      EXECUTE msdb..sp…” failed with the following error: “The EXECUTE permission was denied on the object ‘sp_maintplan_open_logentry’, database ‘msdb’, schema ‘dbo’.“. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.  End Error  Warning: 2010-10-06 11:25:39.79     Code: 0x80019002     Source: OnPreExecute      Description: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.  [blah blah blah] Error: 2010-10-06 11:25:40.58     Code: 0xC0024104     Source: Check Database Integrity Task      Description: The Execute method on the task returned error code 0x80131501 (An exception occurred while executing a Transact-SQL statement or batch.). The Execute method must succeed, and indicate the result using an “out” parameter.  End Error  Error: 2010-10-06 11:25:40.64     Code: 0xC0024104     Source: <another GUID>      Description: The Execute method on the task returned error code 0x80131501 (An exception occurred while executing a Transact-SQL statement or batch.). The Execute method must succeed, and indicate the result using an “out” parameter.  End Error  Warning: 2010-10-06 11:25:40.64     Code: 0x80019002     Source: OnPostExecute      Description: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. [blah blah blah] End Warning  DTExec: The package execution returned DTSER_FAILURE (1).  Started:  11:25:38 AM  Finished: 11:25:40 AM  Elapsed:  2.64 seconds.  The package execution failed.  The step failed.

A Permissions error in MSDB? Huh?

I dug a little, and the short of it is that the Maintenance Plan’s connection string was still set to the departed Admin’s sysadmin account (a SQL login). All of the work that the MP was trying to do, it was still attempting as the other DBA. This, of course, isn’t going to get very far.

Maintenance Plans have connection strings?

This seems like a dumb question to ask, but it’s one of those things that I had glossed over in my head and not actually thought about. The answer: Sure they do! MPs are basically SSIS packages these days, and of course those have connection strings/settings. MPs are no different. Seems like a “DURRR” now, but hindsight, yadda, yadda.

This connection string is controlled by the “Manage Connections” button in Maintenance Plan toolbar. If you never mess with this when you create an MP, it uses an auto-generated connection to the Local Server, using whatever name you used to connect to it (there’s another topic right there, but someday I’ll talk about some quirks with Aliases I’ve run into, and I’ll address that then). This auto-gen’d connection also uses the same auth credentials you are currently using. If you only use Windows Auth, you are stuck with that; if in Mixed Mode, a SQL auth account is an option, and you can set it to another account. Of course, you would need to know that account’s password.

To fix our MP’s connection string, I could change it to my own credentials, but that of course doesn’t actually fix anything, and will leave an even bigger mess for whoever’s left if/when I leave or responsibilities change. I can’t set it to Windows Auth, because when you select that, it validates the auth right then and there. Since we don’t have Windows Auth SA accounts, that won’t work for us: the AD account that I’m logged on with doesn’t have enough rights. Something else needed to be done.

At this point I realized that this is a fair-sized problem and we can’t be the first shop to run into this; I mean, SQL 2005/2008 have been out for a while, and I’m sure lots of DBAs have left shops that heavily utilize MPs. Time to ask the Senior DBA.

Going nowhere fast

For as common as I expected this to be, I wasn’t finding much helpful information. There were some references to this error message, and even someone in the same situation I was in, but most of what I found were dead-end threads.

At one point, I thought for sure that @SQLSarg had it. This blog post contains the revelation that MPs have Owners! His main point of changing the MP owner is to change the owner that the associated Agent Job(s) are set to when you save the MP. Even though I had already changed the job owner to a valid account, I hoped that the MP’s owner had something to do with the credentials that were used in the Plan’s connection string.

Of course, it didn’t. No behavior change.

Aside: Even though it didn’t help with what I was trying to fix at the time, I believe that changing an MP’s owner to a shared or central service account is a good idea. Every time an MP gets modified, the Agent Job’s owner is changed to the same account as the Plan’s Owner. This might not even be you, and when that person leaves the company, you are guaranteed to have execution problems. There’s a Connect item open asking to make it easier to change the owner, so if you like Maintenance Plans and don’t hate yourself, it’s probably a good idea to log in over there and mash the green arrowhead.

The more I dug, the more dead-end threads I read, the more almost-but-not-quite blog posts I ran across, the more I became reserved to one simple fact:

It is impossible to set the Connection String Credentials in a Maintenance Plan to an account that you don’t know the password for.

As that started to sink in, I asked #SQLHelp if that were, in fact, true. I got a couple responses that yes, that was the case (I would be specific, but this was a number of weeks ago, and since Twitter sorta sucks for finding things like that and I can’t remember who it was, we’re out of luck). Although not the news I was looking for, I was glad to know that the deal was pretty much sealed—we were going to need to figure out something else to do.

One last gasp

Since I don’t feel like I’m doing everything I can while troubleshooting a problem if I don’t grab for at least one straw, I decided to investigate something that was described in one of the dead-end threads that I had found while searching.

This thread on eggheadcafe is a conversation between “howardwnospamnospa”* and Aaron Bertrand, who we all know and love. This one “howardw” was in a pretty similar boat to what I was in, and was really close to getting it to work. He was only thwarted because of an apparent quirk when an Instance is in Windows Auth-only mode.

In the last post in the thread, he reports that when he assigns the Local System account (“NT AUTHORITY\SYSTEM”) as owner of the MP, and sets the connection properties of the plan to use Windows Auth, the plan runs. To attempt to duplicate this, I started with an existing MP on a Dev instance that I had set up (therefore my SQL auth account was the MP owner and the user listed in the Connection Properties) and proceeded with the following steps:

  1. Using the statements in SQLSarg’s post, I changed the owner of the MP to Local System
  2. I added my Domain account to the Instance with SA rights
  3. I opened the MP in question with SSMS, switched its connection to use Windows Auth (while logged in with the Domain account used in Step 2), and saved it.
  4. Checked the MP’s auto-gen’d Agent Job to confirm that its owner was NT AUTHORITY\SYSTEM.
  5. Using a connection with my Domain account, I pulled my SQL account’s SA rights. This would previously make the MP sub-plan fail.

I ran the job and, holy cow, it worked! 😀

That’s good, right?

Ehhhh, maybe. Setting the owner to Local System seems pretty dirty. I haven’t thought of a way that this would be a security problem, but it still feels like a stupid hoop to jump through. For us, it has the extra hoop of us having to add our Domain accounts to the instances every time we want to set up or change connection info on an MP (once the Windows Auth is set, it doesn’t re-validate it when you save the plan, so you only have to do it that one time), then take it out when we’re done.

Those bits aside, this workaround (if you will let me get by without calling this a “hack”) does allow setting up Maintenance Plans that will live through the DBAs that set them up leaving the company. By technicality, I achieved my goal.

We’ve talked about this setup very briefly at the office, but haven’t made a final decision on whether or not to go down this road or to abandon MPs for DB maintenance altogether. Hopefully we will get that decision made before someone else leaves.

But at what cost?

The main cost here is a lot of destruction to my affinity for Maintenance Plans.

Using MPs in general, love them or hate them, is pretty much a religious debate. Usually I fall on the proponent side of those arguments. I think they’re easy to set up, they’re pretty flexible, they can dynamically pick up new DBs that get added to the system (not always an advantage), you can cook up some really crazy dependencies within them, and I like pictures & flowchart-y things :-).

After this whole ordeal? Pretty sure I’ve changed my mind. This whole default behavior of how they will stop working when the account that set them up initially is no longer an SA is way too much of a price to pay. What happens when you’re a single DBA shop and you get honked and bail one day? Near as I can tell, if your account gets restricted as it should, backups will stop working, but there won’t be anyone left to notice!

In lieu of MPs, there’s at least one good option out there that has most all of the desired functionality. While this was going on, @TheSQLGuru posted a link to this set of SPs. I haven’t had time yet to really dive into these, but on the surface, they look pretty good.

Final Comment(s)

This entire thing bugs me. I’m almost certain I’m missing something. This should be a huge problem for lots of people. I feel like I’m missing a detail or it’s because we use SQL logins for the most part or some other thing. But… in all of the testing that I’ve done, bad things happen when the initial creating user gets SA pulled. Period. I can’t get past the feeling that I’m doing something wrong, but I haven’t been able to figure it out yet.

In the meantime… I think Maintenance Plans are evil after all…

* Apparently not to be confused with howardwSOLODRAKBANSOLODRAKBANSO[LODRA] (Eve-Online thing; it’s honestly better if you don’t ask)

Unbeknownst to them, Soutwest has a SQL Saturday hookup!

Being the Southwest fanbois that we are, I wanted to bring y’all this PSA 🙂

Winglet!

Everyone loves winglets. OK, maybe not everyone.

They’ve got a sale going on until the end of October 28 (tomorrow) Pacific Time where you can get flights for $30 if you’re going < 450 miles. They go up from there, but the prices are still pretty good.

The windows for the flights are December 1-15 and January 4-February 16. There are FIVE (5) whole SQL Saturdays within those two windows!

  • Dec 4, #61 in DC
  • Jan 15, #62 in Tampa
  • Jan 22, #45 in Louisville
  • Jan 29, #57 in Houston
  • Feb 5, #60 in Cleveland

Sooooo, if you’re wanting to hit up more SQL Saturdays or are speaking at one of those and have to foot the bill yourself, this might be a slightly cheaper way to pull it off.

…You know, if you don’t mind being stuffed in the back of a 737-700.

Edit: Forgot to put the direct link to their page that lists all of the covered cities/destinations: clicky

T-SQL Tuesday #11: The Misconception of Active/Active Clustering

TSQL Tuesday

October's TSQL2sDay: Hosted by Sankar Reddy

I was expecting my inaugural T-SQL Tuesday post to not happen this month… Until just yesterday, when I found myself helping explain to someone what an Active/Active SQL cluster was and what it wasn’t. Ah hah! I expected this to be a topic already covered fairly well, and as expected, I found a couple of really good articles/posts by others on this topic. I will reference those in a bit, but I can cover some more general ground first.

The misconception that was I was explaining yesterday morning is the same one that I’ve explained before and have seen others do the same: An Active/Active SQL Server Cluster is not a load-balancing system. It is, in fact, purely for High-Availability purposes. If what you are really after is true load-balancing, unfortunately, you will have to go elsewhere, to something like Oracle RAC.

When I say “true load-balancing”, I mean being able to load-balance a specific set of data across multiple pieces of hardware. What you can do with Active/Active clustering is load-balance a set of databases across multiple pieces of hardware. This is useful if you have a SQL Instance with multiple very heavily-utilized databases—one or more of these DBs could be pulled out to its own instance on separate hardware. This is not done without potentially serious tradeoffs, however, which will be covered in a bit.

The underlying reason for this is that Windows Clustering is only designed for HA purposes. With its “shared nothing” approach to clustering, any given resource within the cluster (say, the storage LUN that holds your accounting software’s MDF file) can only be controlled by one node of the cluster at a time. If the accounting DB’s home LUN is currently attached to the first node in a cluster, it’s going to be pretty hard for the second node to service requests for that database.

OK, what is Active/Active Clustering, then?

I like to think of it as two Active/Passive clusters that happen to live on the same set of hardware. To illustrate:

Active-Passive Cluster Example 1

This example cluster is a two-node Active/Passive cluster that contains a single SQL Server Instance, CLUS1\ACCOUNTING. This Instance has its own set of drive letters for its DB, Log, and TempDB storage (say, E:, F:, and G:). If both nodes are healthy, Node2 never has anything to do. It sits around twiddling its bits, just waiting for that moment when a CPU melts down in Node1, and it can take over running the ACCOUNTING instance to show how awesome it is.

Active-Passive Cluster Example 2

This is another two-node Active/Passive cluster, running a single instance, CLUS2\TIME (you know, because internal time tracking software is that important). Like CLUS1\ACCOUNTING above, it has its own set of drive letters for its storage needs (M:, N:, and O:)In this case, the SQL Server Instance is on Node2, leaving Node1 with nothing to do.

What do we get if we mashed these two clusters together?

Active-Active Cluster Example

We would get this, an “Active/Active” SQL cluster. It’s logically the same as the two previous clusters: it just so happens that instead of the second node doing nothing, it is running another independent instance. Each instance still has its own set of drive letters (E:, F:, and G: plus M:, N:, and O:). Just like in the “Accounting Cluster” above, if Node1 were to fail for some reason, the CLUS1\ACCOUNTING instance would be started up on Node2, this time, alongside CLUS2\TIME.

So, is doing this a good idea?

Maybe. Above, I talked about how a single instance Active/Passive cluster with more than one heavily-used DBs could be re-architected into an Active/Active cluster with some of the heavy DBs on one instance living on one node and the rest of them on a second instance living on the other node in the cluster. This gives you both High Availability and you’ve spread the load out between two servers, right? It does, but what happens if one of the nodes fails and the two instances “clown-car” onto the remaining node? Are there enough CPU & RAM resources on the remaining server to support the combined load?

That illustrates one of the major things to consider when thinking about deploying an Active/Active cluster. Tim Ford (Blog | @sqlagentman) has an article over at MSSQLTips that goes into detail about making this decision.

In addition, Aaron Bertand (Blog | @AaronBertrand) has a sweet little piece of automation to help manage resources when trying to squeeze the maximum amount or size of clowns into a single car without someone’s foot sticking out in the breeze. Of course, it’s something that you hope never actually gets used, but it’s a good way to wring the most performance out of an Active/Active cluster.

Conclusion

I don’t know if this is the best way to explain this concept, but I like it. The Active/Passive cluster concept is usually understood, so I believe it is a nice stepping stone to get to the truth about Active/Active clusters. As with everything, there are tradeoffs and gotchas that need to be considered when designing server architectures like this, and the above-mentioned others have gone a long way to cover those topics.

There we have it, my first T-SQL Tuesday post. Hopefully it is useful and not from too far out of left field 😉